A              Factic – Faceted Semantic Browser

A.1          Basic Information

Effective search and navigation in large (open) information spaces requires specialized tools that support easy construction of (semantic) search queries as well as effective browsing in the search results with optional query modification.

Suitable user interfaces are needed that support advanced search and navigation paradigms such as faceted navigation. The faceted semantic browser – Factic supports dynamic creation of semantic search queries via user navigation (i.e., selecting restrictions in the set of available facets) and presents the search results in an integrated interface that allows for simultaneous searching and browsing.

A.1.1      Basic Terms

Faceted browser

A software tool that allows users select subspaces of an information space by subsequently selecting restrictions in one or more facets. Faceted browsers are based on faceted classification on the corresponding information space – an orthogonal multidimensional (hierarchic) classification of data.

Facet

A dimension of the information space, which usually corresponds to a single attribute of an information artifact.

Restriction

The value of an attribute which the search results must match in order to be presented in a faceted browser.

Adaptive

Capable of adjusting itself to the needs and requirements of the user automatically, e.g. based on automatically acquired user data and their evaluation.

Adaptable

Capable of being manually adjusted by the user to suit her needs and requirements.

 

 

A.1.2      Method Description

Factic is based on the faceted browser paradigm, where the user navigates the information space by selecting one or more restrictions in the set of available facets. The search results are the instances of the respective information space that satisfy the selected restrictions.

The overview of the navigation method is shown in Fig. 1. The extensions of the method over classical faceted browsers are shown in gray and include the adaptation and annotation of facets, restrictions and search results, together with support for semantic logging of events.

Fig. 1. Overview of the navigation method used by Factic.

Factic constructs a semantic query based on the user’s selection of restrictions in facets and returns the corresponding search results. Furthermore, the set of available facets is adapted based on the user model of the current user retrieved from the ontological repository and based on his activity during the current session.

A.1.3      Scenarios of Use

Factic can be used if a suitable (semi)structured ontological model of an information domain is available. Hence, Factic can be used in the following scenarios:

§  Users search for specific information, e.g. job offers that satisfy a given set of criteria.

§  Users browse the information space in order to gain an overview of the scope and content of available information.

Factic should not be used in following cases:

§  There is no domain model available.

§  The information space of interest is unstructured.

§  The information space is “small”, i.e. it contains only few instances.

§  The metadata about individual instances is sparse – many instances have undefined attribute values or the instances have only very few attributes.

A.1.4      External Links and Publications

§  Tvarožek, M., & Bieliková, M. (2007). Personalized Faceted Navigation in the Semantic Web. In L. Baresi, P. Fraternali, & G.-J. Houben (Ed.), Lecture Notes in Computer Science: Proc. of the International Conference on Web Engineering (ICWE 2007). LNCS 4607, pp. 511-515. Como, Italy: Springer-Verlag, Berlin Heidelberg.

§  Tvarožek, M., Barla, M., & Bieliková, M. (2007). Personalized Presentation in Web-Based Information Systems. In J. Van Leeuwen, G. F. Italiano, W. van der Hoek, H. Sack, C. Meinel, & F. Plášil (Ed.), Lecture Notes in Computer Science: Proceedings of SOFSEM 2007 - Theory and Practice of Computer Science. LNCS 4362, pp. 796-807. Harrachov, Czech Republic: Springer-Verlag, Berlin Heidelberg.

§  Cocoon, a Java-based MVC web framework, Apache Software Foundation. (http://cocoon.apache.org/)

§  Log4J, Java-based logging utility, Apache Software Foundation. (http://logging.apache.org/log4j)

§  Tomcat, a Java servlet container, Apache Software Foundation. (tomcat.apache.org)

§  Sesame, an ontological (RDF/RDFS/OWL) repository. (http://openrdf.org/)

A.2          Integration Manual

Factic is developed in Java (Standard Edition 5) as a generator component for the Apache Cocoon MVC framework. The distribution of Factic consists of these parts:

Factic is designed to be used as part of a Cocoon portal application (Job Offer Portal – JOP). It was not designed to be used programmatically nor as a standalone application, although the dependencies on the Cocoon Portal can be removed in code with reasonable effort resulting in some functionality loss (e.g., no user session monitoring and limited personalization as no user account would be available).

A.2.1      Dependencies

Factic uses these external tools and libraries:

Additional (optional) dependencies required for advanced functions are:

A.2.2      Installation

Before deploying Factic into an application the following prerequisites must be met:

Deploying Factic involves these steps:

1.    The jar archives of dependencies must be deployed in Cocoon in the WEB-INF\lib\ directory.

2.    The Factic jar archives must be deployed in Cocoon in the WEB-INF\lib\ directory.

3.    The configuration files, sitemaps, i18n files and xsl presentation templates of Factic must be deployed in JOP in the coplets directory (e.g., in a subdirectory Factic). The configuration of JOP associated with adding a new coplet must be performed in the configuration files of JOP (e.g, Factic can work as a coplet tab).

4.    The images and css styles of Factic must be deployed in the proper directory in JOP (depends on the configuration of JOP).

A.2.3      Configuration

Factic must be configured to use a suitable ontological repository (factic.properties), domain ontology (factic.xml) and Log4J service (log4j.properties).

The log4j.properties file contains the following values:

log4j.appender.facticLog=org.apache.log4j.FileAppender

log4j.appender.facticLog.file=logs/Factic.log

log4j.appender.facticLog.layout=org.apache.log4j.PatternLayout

log4j.appender.facticLog.layout.ConversionPattern=%d{ISO8601} %5p %c{1}:%C{1}[%M:%L] - %m%n

log4j.logger.Factic=DEBUG, facticLog

The factic.properties file contains the following values:

§  SesameRepositoryURL – host address of the used Sesame repository.

§  SesameRepositoryID – the ID of the used Sesame repository.

§  SesameRepositoryUsername – the user name for the used Sesame repository.

§  SesameRepositoryPassword – the password for the used Sesame repository.

The factic.xml file contains the following sets of values:

§  Global configuration (element <factic>)

o  language – the language of the user interface, which selects the language in ontological queries for labels and the i18n transformer.

o  instanceTypeURI – target instance type, which determines the type of instances that are browsed.

o  itemsPerPage – the default number of search results displayed per page.

o  maxInstanceValueDepthOverview – the depth of recursive property retrieval for instance overview.

o  maxInstanceValueDepthDetails – the depth of recursive property retrieval for instance details.

o  maxInstanceCount – the maximum processed instance count (unused).

§  Namespace prefixes, which describe prefixes of XML namespaces used in the configuration file and in queries (elements <prefixes>, <prefix>)

o  prefix – the short prefix.

o  full – the full URI.

§  Sorting actions that can be performed on the search results in the overview (elements <sortingActions>, <sortingAction>)

o  label – the textual description of the action for presentation.

o  type – the data type of the target value used for sorting.

o  propertyURI – the property that associates the search result type with the sorting value.

o  indirectPropertyURI# – intermediate properties that associate the property value and the original instance for indirect properties (replace # with numbers starting from 0, example: {instance} indirectPropertyURI0 {} indirectPropertyURI1 {} propertyURI {value}).

§  Facet definitions (elements <facets>, <facet>)

o  type – the type of the facet that determines how the property values are processed. Based on facet type, additional attributes might be necessary. Supported facet types are:

-     ClasshierarchyFacet – constructs a facet based on a class hierarchy rooted at rangeURI using both its subclasses and their corresponding instances.

-     InstanceHierarchyFacet – constructs a facet based on an instance hierarchy rooted at the given rangeURI as defined by the property transitivePropertyURI between instance of the type.

-     RelativeDateLiteralFacet – constructs a facet based on the date defined by the given property, relative to the current date.

-     AbsoluteDateLiteralFacet – constructs a facet based on the date defined by the given property and allows the user to select a specific date; requires additional attributes minYear, maxYear that define the minimum and maximum year that will be available.

-     FloatLiteralFacet – constructs a facet based on a decimal literal value; requires additional attributes minValue, maxValue, partitionCount, that describe the minimum and maximum values of available intervals and the number of interval partitions for hierarchical interval selection respectively.

-     IntegerLiteralFacet – constructs a facet based on an integer literal value; requires additional attributes minValue, maxValue, partitionCount, that describe the minimum and maximum values of available intervals and the number of interval partitions for hierarchical interval selection respectively.

o  propertyURI – the URI of property that associates the search result type with the respective restriction value.

o  rangeURI – the URI of the value type that is processed by the facet.

o  indirectPropertyURI# – intermediate properties that associate the property with the search result; used as with sorting actions.

§  Instance type definitions (elements <instanceTypes>, <instanceType>)

o  uri – the URI of the instance type as used in the configuration file and in the domain and user ontologies.

o  literal – a Boolean flag indicating whether the instance type is a literal.

-     For non-literal types (i.e. object properties in the ontology), additional parameters are specified that describe the properties of each type (element <property>.

·      uri – the URI of the property.

·      range – the URI of the target data type of the property.

·      provider – the data provider used to retrieve the values of the property (currently only Sesame is a valid option).

Unless stated otherwise, all URIs should be written in prefixed form using the prefixes defined in the configuration.

Additionally, the XSL presentation templates must be properly configured for the target application domain (e.g., job offers) by referencing the proper property URIs. These are used to render the final GUI of Factic and thus perform the transformation of XML output data from the core Factic engine to XHTML that is sent to the client web browser. Since the XSL templates define the overall layout and content of the browser they should be created by a person skilled in web application interface design.

A.2.4      Integration Guide

Factic can be used to navigate and search in an OWL ontology stored in an ontological repository. A sample user interface of Factic is shown in Fig. 2. The set of available facets and restrictions is shown on the left. The top three facets are expanded and annotated. These can be disabled via the light bulb button. The bottom two facets (Acquisition date and Hours/week) are “hidden”, i.e. their contents are not shown, but can be displayed on demand via the green checkbox button.

The currently selected restrictions are show at the top right. By following the links, restriction in individual facets can be removed resulting in more search results (center). Individual search results can be rated by clicking the stars on the right or their details can be viewed by clicking their name. Clicking “Rate it!” invokes the external personalized ordering tool UPREA, which processes the given user ratings and reorders the search results based on estimated user preferences (via tools IGAP and TopK).

The view can be customized by selected the number of items per page or by sorting the instances based on a given attribute (top right, below current restrictions). If more pages of search results are available, these can be selected in the navigation windows at the bottom.

Fig. 2. Sample user interface of Factic.

Error handling

Most errors originate either from bad configuration (e.g., non-existing repository), from error in the data itself (e.g., non-standard characters) or from caching, connection pooling and session issues in Cocoon, Sesame or MySQL used by associated tools.

The typical response to errors is an application crash via an unhandled exception, which however is often “handled” in Cocoon, and the corresponding creation of an error log.

A.3          Development Manual

A.3.1      Tool Structure

Factic consists of the following packages (structure and dependencies of the packages are displayed in Fig. 3):

§  sk.fiit.nazou.factic – the main package, which contains core files.

§  sk.fiit.nazou.factic.DataProvider – defines generic data provider interfaces.

§  sk.fiit.nazou.factic.DataProvider.Sesame – contains the implementation of data provider interfaces for the Sesame ontological repository.

§  sk.fiit.nazou.factic.Facet – contains implementations of different facet types.

§  sk.fiit.nazou.factic.Resource – contains classes that are used to load data about ontological instances from the Sesame repository.

§  sk.fiit.nazou.factic.ViewProcessor – defines interfaces and implementations of individual view processors.

Fig. 3. Factic packages – structure and dependencies.

A.3.2      Method Implementation

Fig. 4 depicts typical request processing performed by Factic.

Fig. 4. Sequence diagram – request processing of Factic.

A.3.3      Enhancements and Optimization

At present, Factic was not optimized for fast response times although limited optimization efforts were made. The primary bottleneck in performance is the ontological repository Sesame. Consequently, future optimization would likely include caching of data retrieved from Sesame and/or asynchronous communication with the client via AJAX, and thus asynchronous GUI updates resulting in shorter response times.

A.4          Manual for use in Other Application Domains

The search and navigation method used in the Factic tool is universal and thus domain independent. In general, it can be used in any domain for which an ontological domain model in OWL is available, and where the ontology also contains the corresponding instance data, which should be browsed.

Consequently no code changes are required in order to use Factic in a different application domain. We demonstrated this by using Factic in the domain of job offers, and in the domains of scientific publications and images without code changes.

A.4.1      Configuration for use in Other Application Domains

In order to use Factic in another application domain, it must be properly configured to process a different ontology. This includes the configuration of the:

A.4.2      Dependencies

Most dependencies are domain independent and thus require no domain specific adjustments. The SemanticLog logging service is domain independent, since it stores events passed by Factic, which already adjusts them to a different domain if necessary. The LogAnalyzer inference agent may require some domain specific adjustments, although the inference rules that are used are dependent on the navigation model of a faceted browser rather than on the application domain. However, as some additional domain specific rules may be used, some domain specific adjustments may be required.