Click on one of the categories below to go quickly to the corresponding link section.Categories:
- Benchmarking and data sets,
- Algorithms for Graph Matching ,
- Graph Drawing,
- Combinatorial maps,
- Graph kernels,
- Other TCs,
- Various links
- The Graph Database < http://mivia.unisa.it/datasets/graph-database>
- IAM Graph Database Repository for Graph Based Pattern Recognition and Machine Learning
- Chemistry datasets provided either by the GREYC laboratory or by the chemoinformatics web site.
- The LEMS database of shapes
- ILPIso datasets
- The Graph Data Repository For Graph Edit Distance (GDR4GED)
- Results of ICPR 2016 contest on Graph edit distance
- Challenge 1: computation of the exact or an approximate graph edit distance
- Challenge 2: computation of a dissimilarity measure for graph classification
The /ARG Database/is a huge collection of labeled and unlabeled graphs realized by the MIVIA Labof the University of Salerno (Italy). The aim of this collection is to provide the graph research community with a standard test ground for the benchmarking of graph matching algorithms.The database is organized in two section: labeled and unlabeled graphs.
Both labeled and unlabeled graphs have been randomly generated according to different generation models (2D, 3D and 4D regular meshes, 2D, 3D and 4D nearly-regular meshes, bounded valence, nearly-bounded valence, randomly generated) regular, each involving different possible parameter settings.
As a result, 168 diverse kinds of graphs are contained in the database. Each type of unlabeled graph is represented by thousands of pairs of graphs for which an isomorphism or a graph-subgraph isomorphism relation holds, for a total of 143,600 graphs. Furthermore, each type of labeled graph is represented by thousands of pairs of graphs holding a not trivial common subgraph, for a total of 166,000 graphs. Currently graphs have size ranging from a few up to thousands of nodes.
The above website aims at introducing a repository of graph data sets
and corresponding benchmarks, covering a wide spectrum of different
applications. We expect that the graph repository provides a major
contribution towards promoting the use of graph based representations
and making graph based pattern recognition and machine learning
algorithms better comparable against each other. In future work we
want to further expand the graph set repository. Towards this end, we
highly encourage the community not only to use the available sets for
developing and testing their algorithms, but also to integrate their
own graph sets into our repository.
This repository is well documented in the following paper:
Riesen, K., Bunke, H.: IAM graph database repository for graph based pattern recognition and machine learning, in N. da Vitoria Lobo et al. (Eds.): Structural, Syntactic, and Statistical Pattern Recognition, Springer, LNCS 5342, 2008, 287 - 297
Several databases of graphs corresponding to
classification or regression problems in
The Vision Group at LEMS proposes several shape databases composed of 99, 216 and 1032 shapes. This database is essential for people dealing with shape recongition. Interested readers may also have a look at this small database of 33 holed shapes. This database may (and should) be used in conjunction with one of the LEMS database.
Four synthetic datasets are provided. Each of them is composed of several triplets (pattern graph, target graph and groundtruth). The graphs have been randomly generated thanks to the Erdos-Renyi model with or without the constraint of producing connected graphs. For each option, one version of the dataset has an exact mapping (equal labels between matched vertices/edges) whereas an other version includes noise on label values. The groundtruth information gives the one-to-one vertex mapping involving the minimal cost assignment.
An additionnal dataset is composed of graphs generated from a real application. Target graphs are region adjacency graphs representing architectural plans. Pattern graphs are structural representations of graphical symbols. The groundtruth information gives the instances of patterns in the target, which correspond to instances of symbols on the plans.
To precisely evaluate the accuracy of Graph edit distance (GED) methods, Ground-Truth with information about the quality of the matching between pairs of graphs (low level information) is required in addition to higher level information (label of the object represented by the graph, classification rates) usually provided and used as GT to evaluate the methods. Most of the publicly available Graph repositories with associated ground-truths are dedicated to evaluating graph classification or exact graph matching methods and so the matching correspondences as well as the distance between each pair of graphs are not directly evaluated.
This Dataset called GDR4GED consists of a graph database repository annotated with low level information like graph edit distances values (optimal and sub-optimal) and also their corresponding node to node matching (Edit Paths). A set of performance evaluation metrics that can be computed using this dataset to assess the performance of GED methods is also proposed in the conference paper (GbR2015) describing this repository (available with the repository).
Measuring the dissimilarity between graphs is a key step in data analysis based on graph representations. This contest focusses on the definition and the computation of general dissimilarity measures through two challenges:
- LAD Software <a href="http://liris.cnrs.fr/csolnon/LAD.html">
- VF Library <http://mivia.unisa.it/datasets/graph-database/vflib/>
- GUB Library <http://amalfi.dis.unina.it/graph/misc/gub_tool.tar>
- Graphm package <http://cbio.ensmp.fr/graphm/>
- A software toolkit for graph edit distance computation.
- A* algorithm with bipartite Heuristic (exact graph edit distance)
- Beam search
- Bipartite Graph edit distance using Assignment Algorithms
LAD is a software (written in C) for solving the subgraph isomorphism problem, the goal of which is to decide if there exists a copy of a pattern graph in a target graph. It is distributed under the CeCILL-B FREE SOFTWARE LICENSE.
This is a graph matching library,developed by the Mivia Lab of the University of Salerno (Italy) which provides several algorithms for graph isomorphism, graph-subgraph isomorphism and graph monomorphism. The library is an efficient implementation of the VF2 graph matching algorithm, currently among the fastest ones, able to process very huge graphs due to its linear memory complexity.
The library is written in C++, and can be easily adapted to use different formats for the graphs, and different types of attributes. The web page contains the library documentation, including some usage examples, and the source files of the library.
This is a library for graph matching incorporating several algorithms. The library has been developed by Bruno Messmer at the University of Bern.
GraphM is a tool for approximate solving of large scale graph matching problems. The package was developed in the Centres for computational biology and for mathematical morphology at Mines ParisTech.
Graph edit distance is one of the most flexible mechanisms for error tolerant graph matching. The present software provides both a library and a graphical interface to compute the edit distance between sets of graphs using various heuristics. Namely:
ILIPIso is a software that implements the approach decribed in [Le Bodic et al. 2012] for substitution-tolerant subgraph isomorphism. It aims at finding assignments between a pattern graph and a target graph. Every vertex/edge of the pattern graph is matched to a different vertex/node in the target graph. Differences between attributes are allowed but penalized by a cost function. The matching is done so that the global assignment cost is minimal. It allows to process graphs whose vertices and/or edges can be labeled with numerical vector of any dimension.
The software can be configured in order to search for several instances of the matching, and return them by non-decreasing cost order. Several settings are available to discard solutions previously found, when searching for a new one.
Currently, only an online version and a compiled version are available (other architectures on demand), but source code should be provided soon.
- Graph Drawing <http://www.graphdrawing.org>
A site about graph drawing, mantained by some of the organizers of the International Symposium on Graph Drawing. The site contains links to several graph file formats, most of them based on XML.
- Girl Library
- An implementation of Combinatorial Pyramids
Descartes is a segmentation software based on a combinatorial map encoding of the partitions. It can import/export segmentation and is designed for an interactive use: The user select the regions to split or to merge.
A library which allows to structurate a partition thanks to combinatorial maps. Multiple adjacency and inclusions are encoded. The data structure may be efficiently updated after both split and merge operations.
A library which allows to design hierarchical segmentation algorithms using pyramids of combinatorial maps (combinatorial pyramids). Multiple adjacency are explicitely encoded and inside/outside (i.e. inclusion) relationships may be effciently computed. This implementation is based on an implicit encoding which allows to retreive any partition of the pyramid at a low memory cost.
ChemCpp is a C++ toolbox for chemoinformatics focusing on the computation of kernel functions between chemical compounds.
- STRUCTURAL & SYNTACTICAL PATTERN RECOGNITION (TC2)
- Graphics Recognition (TC10)
- Reading systems (TC11)
IAPR's Technical Committee 2 on SSPR promotes interaction among researchers working on Structural and Syntactical Pattern Recognition (SSPR).
Graphics Recognition is an exciting field of pattern recognition. Along with optical character recognition (OCR) and document layout analysis, it forms the broader area of document image analysis and recognition. IAPR's Technical Committee 10 on Graphics Recognition promotes interaction among researchers working in document image analysis in general, and graphics recognition in particular.
Technical Committee 11 is concerned with theory and application of Reading Systems. The goal is to understand and develop systems which are able to analyze media containing character symbols and will return an encoded representation of the text content and structure.
This area covers research in the off-line recognition of printed texts and handwritten material, research on the automatic on-line recognition of handwriting as in pen-computing applications, as well as research in the analysis of electronic documents such as web pages. Applications include digital libraries, pen-based computing, check and mail reading, signature verification, web mining and content repurposing, web security using human interactive proofs, textual content analysis in videos, and historical document preservation and archiving.
The research topics cover all levels of processing, starting with the image or planar coordinates delivered by sensory technology, the necessary image and signal-processing stages, the subsequent segmentation and normalization stages, to feature extraction and selection, classification methods, statistical and syntactical pattern recognition and machine learning techniques.
- Simbad FP7 European Projet
- GbR'2011 report
- Machine learning open source software
This project aims at undertaking a thorough study of several aspects of purely similarity-based pattern analysis and recognition methods, from the theoretical, computational, and applicative perspective. It aims at covering a wide range of problems and perspectives. It will consider both supervised and unsupervised learning paradigms, generative and discriminative models, and our interest will range from purely theoretical problems to real-world practical applications.
This report documents the discussion about challenges in graph-based representations held at the 8th IAPR - TC-15 workshop on graph-based representations in pattern recognition. The workshop took place in Münster (Germany) from the 18th to the 20th of May 2011. The discussion was initiated by four presenters (Milan Sonka, Luc Brun, Edwin Hancock, Pasquale Foggia) and moderated by Horst Bunke.
Open source tools have recently reached a level of maturity which makes them suitable for building large-scale real-world systems. At the same time, the field of machine learning has developed a large body of powerful learning algorithms for a wide range of applications. Inspired by similar efforts in bioinformatics (BOSC) or statistics (useR), the aim of mloss community is to build a forum for open source software in machine learning.