Clusterer & Cluster Navigator

Various methods for hierarchical graph clustering and cluster-based navigation.

Institution: Slovak University of Technology
Technologies used: Java, JUNG, MySQL
Inputs: publications meta-data
Outputs: hierarchical map of publications graph
Documentation (Clusterer): HTML, doc, JavaDoc
Documentation (Cluster Navigator): HTML, doc, JavaDoc

Addressed Problems

When the information space consists of objects and relations between these objects (e.g. publications and authors which published them), graph representation can serve as the basis for easier user navigation in the information space.

Unfortunately, raw graph presentation can be actually quite overwhelming for the user, hence additional structure needs to be created, allowing the user to select him the best-fitting level of granularity. In such cases, clustering can be used to partition the dataset into subsets (clusters) of similar objects. Such clustered graphs can be then used to simplify graph presentation.

Description

In general many subgraphs of OWL data can be created for different purposes. This method uses ontology data stored in OWL as an input to create a subgraph stored in MySQL database that will be later clustered.

Clustering process takes graph stored in MySQL database and applies various graph clustering algorithms to create clusters. These clusters are again clustered in a recursive fashion resulting into a hierarchical layered cluster tree. This hierarchical structure is again stored in MySQL database.

Clusterer architecture image

Clusterer architecture

Clustering process itself is executed offline due to computational and memory complexity of used graph ranking algoritms.

The visualization tool (Cluster Navigator) consists of a server and a client part. The server generates the part of the network and serves it as a graphml file for the client.Process of job clusters discovering. The client visualizes the retrieved graphml file.

In the server the JUNG library was used, which is capable of generating graphml output. The server retrieves the ID of the cluster to be visualized from the client. The user can further click on a subcluster of the retrieved network or can choose to go upward in the cluster hierarchy. In this case the server retrieves different signal and replies with a graphml file describing network on a higher level.

Clusternavigator screenshot

Visualization of the network

References

  1. Mária Bieliková, Gyorgy Frivolt, Ján Suchal, Richard Veselý, Peter Vojtek, and Oto Vozár: Creation, population and preprocessing of experimental data sets for evaluation of applications for the semantic web. In Lecture Notes in Computer Science, SOFSEM 2008.
  2. Gy. Frivolt, M. Bieliková: A community-cutting approach, In: RAWS 2005 - Proc. of the 1st Int. Workshop on Representation and Analysis of Web Space. 2005. pages 49-59
  3. Gy. Frivolt, M. Bieliková: Analysis of massive networks, In: IIT.SRC 2005 - Informatics and Information Technologies Student Research Conference. 2005. pages 35-40