A.1 Basic Information
Personalization is becoming more important if we want to preserve the effectiveness of work with information, providing larger and larger amount of the content. Systems are becoming adaptive by taking into account characteristics of their users.
LogAnalyzer supports personalization of web-based systems by discovering user characteristics. It processes the user activity log acquired by SemanticLog tool, identifies meaningful user characteristics and updates the user model to reflect the newly gained knowledge. It is focused on the evaluation of user navigation in the available information space.
A.1.1 Basic Terms
Any piece of information concerning the user which could be used to provide personalization (e.g., goals, interests, knowledge).
Uniform Resource Identifier, a compact string of characters used to identify or name a resource.
A.1.2 Method Description
LogAnalyzer uses rule-based approach to analysis of user logs. The analysis process is depicted on Fig. 1. Externally defined rules represent heuristics, which link interesting navigational patterns with sets of changes to be applied on the user model if an occurrence of the pattern is detected.
Fig. 1. Overview of user characteristics acquisition process. Data from presentation tools and client-side logging are stored in a database of user actions. LogAnalyzer tries to detect (1) occurrences of predefined patterns and optionally stores intermediate results (2). The heuristics associated with the detected pattern (3) predict the update of characteristics stored in a user model (4).
The main benefits of the used approach are its re-usability and flexibility. We can easily change the behavior of the process by changing underlying heuristics.
A.1.3 Scenarios of Use
LogAnalyzer can be used in the following scenarios:
§ Navigational model of a web-based application allows individual users to navigate freely in the information space (so user behavior can be influenced by user characteristics).
§ Presentation layer of a web application is able to perform adaptation of content and/or navigation according to characteristics stored in a user model.
LogAnalyzer should not be used in following cases:
§ Unsuitable navigational model (e.g., strictly sequential without possibilities to change the order of displayed pages).
§ Tvarožek, M., Barla, M., Bieliková, M.: Personalized Presentation in Web-based information Systems. In van Leeuwen, J. et al., (Eds.): SOFSEM 2007 , Springer, LNCS 4362, Harrachov, ČR, pp. 796-807, 2007.
§ Tvarožek, M., Barla, M., Bieliková, M.: Personalized Recommendation of Browsing in Large Information Spaces with Semantics. In: Sobecki, J. (Ed.): Special Issue of New Generation Computing – Web-based Recommendation Systems Technologies and Applications, Vol.26, No.3 May 2008 submitted.
§ Commons Configuration, Java configuration API, Apache Software Foundation, (http://commons.apache.org/configuration/)
§ Hibernate, Relational Persistence for Java, Red Hat Middleware, LLC (http://www.hibernate.org/)
§ Log4J, Java-based logging utility, Apache Software Foundation. (http://logging.apache.org/log4j)
§ ITG, Integration technology developed within NAZOU project
§ OntoCM, Java-based connector to ontological part of corporate memory developed within NAZOU project
§ SemanticLog, Java-based logging service developed within NAZOU project
§ UserLogs, java based library providing object-oriented representation of user actions records
A.2 Integration Manual
LogAnalyzer is developed in Java (Standard Edition 5) and distributed as a jar archive. Access to the functionality of the tool is provided through Java Interface. LogAnalyzer is not a stand-alone application; the tool is proposed to be included in other application/tool, which will call the LogAnalyzer interface methods (like SemanticLog tool).
LogAnalyzer uses these external tools and libraries:
§ Commons Configuration – for processing of xml-based configuration files.
§ Hibernate – for accessing log records and storing of intermediate results
§ OntoCM – for accessing and updating ontology-based user model.
§ ITG – for accessing configuration files stored in a common folder
§ Log4J logging utility
Deploying LogAnalyzer into other application requires the following steps (any Java Integrated Development Environment should be used):
1. LogAnalyzer as well as all external jar archives must be included into existing project.
2. hibernate.cfg.xml and log4j.properties files must be included in the classpath.
3. Files LogAnalyzer.properties, UserModelUpdater.properties, UserCharacteristics.properties, OntoMem.properties and file containing set of rules for LogAnalyzer must be included in the LogAnalyzer directory in the directory holding common configuration files (see ITG for further details).
4. Integration technology ITG must be deployed correctly – i.e., Nazou-bootstrap.properties file must be placed in the classpath.
5. Database UserLogs of SemanticLog tool should be updated by createTables_LogAnalyzer.sql script.
Several configuration files need to be set in order to allow LogAnalyzer working properly:
§ startTime – contains the date and time when the LogAnalyzer was invoked the last time.
§ rules – points to an xml configuration file containing the rules the LogAnalyzer should work with.
§ configDigest – contains the MD5 digest of the rules configuration file. If the set of LogAnalyzer’s rules changes, the tool omits all intermediate results held currently in relational database.
Each property is a type of user characteristic identified by its URI while its value is a full class name (including the package name) of a class which provides the tool with access to the characteristics of the given type.
§ changeCharacteristicsFromOtherTools – boolean value determines whether the tool is allowed to change characteristics with their source different to LogAnalyzer.
§ sourceProperty – URI of a property, which connects a characteristic with its source.
§ sourceInstance – URI of an instance representing LogAnalyzer as a source of user characteristics.
§ predicateToDomainDepUser – URI of a property linking domain dependent user from a domain independent instance.
§ domainIndepNamespace – namespace of a domain independent part of user model.
§ domainDepNamespace – namespace of a domain dependent part of user model.
§ countOfUpdatesProperty – URI of a property linking a characteristic with its count of updates.
§ timeStampProperty – URI of a property linking a characteristic with its timestamp of last update.
§ confidenceProperty – URI of a property linking a characteristic with its confidence value.
§ relevanceProperty – URI of a property linking a characteristic with its relevance value.
§ characteristicConnectionToDomainDepUser – URI of a property linking a domain dependent user instance to a characteristic instance.
Contains configuration of OntoCM tool to access an ontological repository. Could be left blank if the tool should use the global configuration from Nazou-commons.properties.
An xml-based configuration file contains the root element <rules> which encapsulates <rule> elements, each having an id attribute (URI). Each rule has two children: <sequence> and <subsequences> elements.
§ <sequence> element represents a sequence of event and sub-sequences in the a pattern. It can define contextual conditions in child <context> element. Sequence can also have <sequence> and <event> as children elements and have following attributes:
o id – serves for cross-referencing purposes within the configuration file and internal processing, should be unique through whole configuration file.
o count-of-occurrence – an integer value defining the required count of occurrence of a sequence within a pattern. If a value is set to -1, it means the sequence is optional (i.e., not strictly required for the pattern to be found in the log).
o isContinuous – boolean value defines whether a sequence is optional or not, i.e., whether the events of the sequence must strictly follow each other or not.
§ <context> element (as child of a sequence element) is used to restrict events mapped to the sequence to those fulfilling the restriction on an attribute of certain type. The displayed item is specified in <typeOfDisplayedItem> child element and its type of attribute in <attribute> child element.
§ <event> element represent a single event in the event log. An event can define several contextual conditions represented by multiple children elements <context>. Event element has following attributes:
o id – serves for cross-referencing purposes within the configuration file and internal processing, should be unique through whole configuration file
o type – represents type of an event as it is stored in the log database (typically identified by its URI)
§ <context> element (as a child of an event element) is used to express relation of current event to the previous one by value of some attribute. It has an attribute type, which can be either http://fiit.stuba.sk/loganalyzer#SameAsPrevious or http://fiit.stuba.sk/loganalyzer#DifferentThanPrevious. Next, an <attribute> children element defines the type of an event attribute on which the contextual condition is applied
§ <consequences> element wraps multiple <change> elements representing change of one characteristic. Each change relates to one type of characteristic, which is expressed by one <characteristic> sub-element where name attribute defines the type of characteristic. Next, <change> element can have multiple child elements <property> concerning properties of a characteristic. Each property has a name attribute which contains URI of the property in used ontology. Property can be of three different types, distinguished by a type attribute:
o Referencing property – whose value is a value of an event or displayed item attribute(s)
o Used property – whose value is stated directly in the configuration file
o Processed property – whose value is computed using instruction present in the configuration file
o <property> element can have several different attributes and child elements according to its type. See javadoc for further details.
A.2.4 Integration Guide
LogAnalyzer is invoked when processing of events stored in user logs is required. It is done by calling signal method of ILogAnalyzer interface. The tool upon its calling processes all events with timestamp between actual time and the time of the last run of LogAnalyzer. This time is maintained in startTime property of LogAnalyzer.properties file and can be changed manually if needed.
A.3.1 Tool Structure
LogAnalyzer consists of the following packages (structure and dependencies of the packages are displayed in Fig. 2):
§ sk.fiit.nazou.loganalyzer – the main package, which contains core files.
§ sk.fiit.nazou.loganalyzer.exception – defines exceptions thrown by the tool.
§ sk.fiit.nazou.loganalyzer.instances – contains classes that represent intermediate results which could be persisted in relational database.
§ sk.fiit.nazou.loganalyzer.kb – contains classes that represents object model of rules the tool should work with.
§ sk.fiit.nazou.loganalyzer.model – contains classes for object oriented representation of user model (used for open user modeling purposes).
§ sk.fiit.nazou.loganalyzer.modelprovider – defines interface of user model provider and contains implementation of NAZOU user model provider (used for open user modeling purposes).
§ sk.fiit.nazou.loganalyzer.updateStrategies – defines an interface and abstract strategy for update of a processed property as well as an implementation of three different update strategies.
§ sk.fiit.nazou.loganalyzer.usercharacteristicprovider – defines an interface of user characteristic provider and a factory for providing specific implementations of this interface.
§ sk.fiit.nazou.loganalyzer.usercharacteristicprovider.nazou – contains implementation of user characteristic providers for characteristics used in NAZOU project.
§ sk.fiit.nazou.loganalyzer.util – encapsulate classes providing supporting functionality.
Fig. 2. LogAnalyzer packages – structure and dependencies.
A.3.2 Method Implementation
Fig. 3 depicts the processing of events – pattern detection performed by LogAnalyzer tool. After a pattern is found, each change of a consequence part of the rule is executed. This means that the tool retrieves the user characteristic determined by referencing and used properties (if such a characteristic does not exist yet, it is created) and updates all processed properties according to a given strategy as well as additional metadata about characteristic such as timestamp or count of updates.
Fig. 3. Sequence diagram – pattern detection.
A.3.3 Enhancements and Optimizing
LogAnalyzer is implemented as a singleton, which ensures that the rule configuration file is loaded only once, when JVM loads the LogAnalyzer class. At present, LogAnalyzer was not optimized for fast response times as it is not a part of the user interface. The primary bottleneck in performance is the ontological repository Sesame.
Future enhancements could introduce meta-rules which could dynamically change the working set of rules or dynamic rules whose parameters would be deduced automatically from the actual domain context.
A.4 Manual for Adaptation to Other Domains
Rule based method of log analysis implemented by LogAnalyzer does not depend on used domain, but rather on the used navigation model. However, various domains could presume various types of characteristics derived from various user actions. This would lead to change of rules the LogAnalyzer is working with.
A.4.1 Configuring to Other Domain
If another structure of user model is used, it is necessary to provide specific implementations of characteristic providers, implementing methods defined in sk.fiit.nazou.loganalyzer.usercharacteristicprovider.IUserCharacteristicProvider and provide mapping from characteristic to these providers in UserCharacteristics.properties.
Various types of update strategies which could be more suitable to particular domain can be implemented using sk.fiit.nazou.loganalyzer.updateStrategies.UpdateStrategy interface, optionally deriving it from AbstractUpdateStrategy from the same package.
Most dependencies are domain independent and thus require no domain specific adjustments. UserLogs package is derived from user log database, filled by SemanticLog tool. This data structure was devised to be independent enough from actually used domain and navigation model.