Jump to: navigation, search

SMILA/Specifications/UsefulLogMessages

< SMILA‎ | Specifications
Revision as of 11:33, 17 November 2008 by Pwissel.brox.de (Talk | contribs)

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)

Apache log4j is the open source logging tool developed under the Jakarta Apache project. It is a set of APIs that allows developers to write log statements in their code and configure them externally, using properties files. Logging (e.g., in a.log file or in a SQL data bank) is a big help during the development time as well as later to the errors search with the customer. It is used primarily as a debugging tool. Smila is using a Apache log4j.

Current Problems

  • Too much Information in the log

File EILF.log where registers a log information for Smila contains too much not the helpful information. Most of all messages arrive from debug. For example, at indexation of file EILF logging file has 33 000 lines, the majority from which an extracted context. To search for the necessary information is not so conveniently. For example, at indexation of file EILF (for xml files) EILF.log has 33 000 lines, the majority from which an extracted context. Today it is easy to get lost and not to find the necessary records in a pack of messages. The surplus information litters the log.

  • Messages do not correspond to the level

From Short introduction to log4j: INFO - for the "high level" information; DEBUG - only for the software developer interesting messages. Now in INFO is written too much information for developer.

For example this information is written down in INFO. Isn't that so here approaches DEBUG more?

- AddPipeline/extensionActivity-activity-line-36: found service MimeTypeIdentifier, processing request -> request

- AddPipeline/extensionActivity-activity-line-50: found service LuceneIndexService, processing request -> request

- AddPipeline/extensionActivity-activity-line-43: no pipelet configuration found.

- AddPipeline/extensionActivity-activity-line-43: found pipelet org.eclipse.eilf.processing.pipelets.aperture.AperturePipelet, processing request -> request

- DeletePipeline/extensionActivity-activity-line-36: found service LuceneIndexService, processing request -> request

- Initialization of BPEL engine successful


  • Insufficiently informative messages of info level

This point is connected with the previous.

Discussion

Sebastian Voigt: My Suggestion is to change the default logging settings to minimize logging information. (Question is more, what do we have to see (ODE logging messages is an example). We have to see errors messages thrown by the components, but we don't want to see every message from ode etc...

Jürgen Schumacher: From a quick look at a quite old EILF.log on my machine, I think it should be sufficient to switch the log level for org.eclipse.eilf from DEBUG to INFO. Or am I getting anything wrong?

Thomas Menzel: My opinion on this:

1. logging is there to help you, too much just doesn't.

2. have INFO level to show where the process/program is roughly. if u have loops that are executed often or have many iterations than I think it is a good choice to just log every N iterations. the overhead for this is minimal and the output a welcome indicator for anybody concerned about the progress and speed of a process/loop.

3. DEBUG: give verbose info that is needed for debugging purposes. that includes most important the state of the process and involved objects

Allan Kaufmann: I think it could be helpful if users received errors and if log error gives important information but not too much. Currently the EILF.log shows much, so that exceptions and error like this could be overlooked.

Sofya Zhbankova: and that if to separate a logging information in the different files: debugging messages to one file (EILF.debug) and other messages to another file (EILF.log). Most of all messages arrive from debug . It seems to me that it is possible few to "unload" debug. For example, whether it is necessary to deduce in the file an extracted content.

For example, at indexation of file EILF (for xml files) EILF.log has 33 000 lines, the majority from which an extracted context. To search for the necessary information is not so conveniently.

Allan Kaufmann: In my opinion we should keep EILF.log as one log file that hold all logs of the different levels in order of the runtime, because while searching for information it´s difficult to look and sort with different log files. But maybe it could be helpful to received EILF.debug or EILF.error as additional file? Sample: Our Logger is set to DEBUG, so EILF.log received all Logs until DEBUG-level. But if a developer just wants information about errors, he could have a look to EILF.error. Disadvantage: More performance because information could log more than one time. If we choose the idea to separate the log in different files without a main file like EILF.log or otherwise, it´s necessary to design the log clearly, so that information could be found better and messages are more helpfully.

Ivan Churkin: As you already mentioned, the order of the runtime will be lost, so system have to support complete log file. Additional log files may be added but I'm not sure that its info will be enough. bwt, for example, GREP may be used for investigating/filtering log file.

Daniel Stucky: I agree with Allan to keep one log file. I think we should set up some guidelines for developers when and what to log on which level to provide meaningful log entries.

Then it's only a matter of configuration - we should provide a default log4j configuration for development (e.g. log level DEBUG) - we should provide a default log4j configuration for the application (e.g. log level INFO)

Everybody is free to change log4j settings in his/her development or application environment if those defaults don't fit.

Sofya Zhbankova: I agree with Daniel that "..everybody is free to change log4j settings..". For me to search in separate files it is much easier. Today it is easy to get lost and not to find the necessary records in a pack of messages. The surplus information litters the log. Can be all the technical information to write down in debug and in info - only final and short messages:

- Initializing Crawler...

- some documents are processed and stored into the index

Daniel Stucky: I added a section "Logging guidelines" to our Development_Guidelines: [[1]]. Feel free to comment or improve.

Sofya Zhbankova: From general: INFO - for the "high level" information; DEBUG - only for the software developer interesting messages. I think that for Smila reasonably too.

Thomas Menzel: I have some remark to:

  • log exceptions before throwing new exceptions, e.g.

... if( paramXY == null ) {

   if (_log.isErrorEnabled()) {
       _log.error("paramXY is not set");
   }
   throw new NullPointerException("paramXY is not set"); } ////


I advise against this as it only causes the same message to be repeated at least twice, namely when the exception is created and then when the exception is logged itself. IMO it is sufficient to have a verbose error msg and this then is reported along with the stack trace when the exception is logged. if the exception is handled then this message is even confusing b/c u have ERROR log entry but wonder: what happened then? if u deem the condition so extraordinary and dangerous we could log it as warning, but since we don’t log the exception here it should not be level ERROR.

even worse than this is sometimes the habit to log an exception before (re)throwing or wrapping it, which I have seen as often (in general, not this project in particular). this then has the effect that u have the same exception logged several times which makes the log very verbose and confusing. as u think u have lot of problems/exception when it fact it is only that one case.

of course this only applies when u remain within our own system bounds. if u write a library that is called from other code then it is wise to log the exception at ur system boundary, i.e. the method that was called by the client, that might not adhere to sound logging standards. again, there are also pitfalls with this convention, e.g. in case of an overloaded API method where u have a simple version that delegates to a more verbose other API method that also could be called from a client.

Log in the future

How to improve a log for Smila and what will be a log in the future?

The basic ideas stated by developers ware:

To change the default logging settings

The Smila is configured through a property file - log4j.properties. Configuring log4j typically involves assigning Logger, defining Appender, and specifying Layout objects. A logger logs to an appender in a particular layout (style). Loggers need to know where to send your requests for logging. This is where the appenders come into picture. log4j supports writing to files (FileAppender), to console (ConsoleAppender), to databases (JDBCAppender), to NT event logs (NTEventLogAppender), to SMTP servers (SMTPAppender), to remote servers (SocketAppender), and others.

A layout defines the style and content of the output log.

Each logger in log4j is assigned a level (DEBUG, INFO, WARN, ERROR, FATAL).

PatternLayout lets you specify the output format according to conversion patterns similar to the C language printf function.


The possible configuration log4j.properties for Smila

PatternLayout with the conversion pattern %x %d{ISO8601}; %-5p; [%-45t] ; %-45c{2} ;  %m%n will output something like:

2008-10-01 15:34:21,283; INFO ; [Start Level Event Dispatcher ] ; workspace.WorkspaceHelper  ;  Using default workspace.

File columns are separated by semicolons (.csv file, can be worked on directly, e.g., in Excel) and :

  • %x - is the nested diagnostic context (NDC) associated with the thread that generated the logging event. Useful when multiple clients are handled by separate threads such as in Java servlets.
  • %d{ISO8601} - is the time.
  • %-5p - is the priority of the log statement.
  • [%-45t] - is the thread making the log request.
  • %-45c{2} - is the name of the class associated with the log request.
  • %m%n - is the message of the statement with new line.

Use a appender to log to a file. Let's set default level DEBUG for package org.eclipse.eilf. For some classes which give much debug messages we reduce level to info.

Let's consider some interesting options of FileAppender:

  • FileAppender Option Threshold is useful to cut down the number of messages.

For example:

log4j.appender.file.threshold=INFO
Appender will not log any messages with priority lower than INFO even if the category's priority is set DEBUG.
  • Append=false: Default is true which means append to the end of the file. false overwrites the file at the start of each program run.
# The file overwrites at the start of each program run. 
log4j.appender.file.append=false


log4j.properties for Smila:

# Set root logger level to INFO and its only appender to file.
# Print only messages of priority INFO or higher.
 
log4j.rootLogger=INFO, file
 
# The appenders should not be accumulated from ancestors 
log4j.additivity.org.eclipse.eilf=false
 
# A appender writes to a file EILF.log
log4j.appender.file     = org.apache.log4j.FileAppender
log4j.appender.file.File    = EILF.log
log4j.appender.file.layout  = org.apache.log4j.PatternLayout
 
# The file overwrites at the start of each program run. 
log4j.appender.file.append=false
 
# Pattern to output in the file:     {d} time {p} level {t} thread name  {c} class name  {m} message
log4j.appender.file.layout.ConversionPattern=%x %d{ISO8601}; %-5p; [%-45t] ; %-45c{2} ;  %m%n
 
# default logger for package org.eclipse.eilf
log4j.logger.org.eclipse.eilf=DEBUG, file
 
# Extention default logger for classes
# Print only messages of priority ERROR or higher for these classes.
 
log4j.logger.org.eclipse.eilf.processing=ERROR, file
log4j.logger.org.eclipse.eilf.processing.bpel=ERROR, file
log4j.logger.org.eclipse.eilf.binstorage=ERROR, file
log4j.logger.org.eclipse.eilf.blackboard=ERROR, file
log4j.logger.org.eclipse.eilf.xmlstorage=ERROR, file
log4j.logger.org.eclipse.eilf.connectivity.deltaindexing= ERROR, file
log4j.logger.org.eclipse.eilf.connectivity.queue.worker=ERROR, file
log4j.logger.org.apache.ode.ODEServer=ERROR, file

To switch the log level for messages

The point especially concerns INFO and DEBUG levels. I think we should adhere to the general rules of use of levels:

INFO - Interesting runtime events (startup/shutdown). Expect these to be immediately visible on a console, so be conservative and keep to a minimum.

DEBUG - detailed information on the flow through the system. Expect these to be written to developers only.

( See log4j javadocs: The DEBUG Level designates fine-grained informational events that are most useful to debug an application. The INFO level designates informational messages that highlight the progress of the application at coarse-grained level. )

Really, it would be desirable to receive "easier" information in INFO. Like as:

- Smila started successfully

- indexing file system AAA starting

- folder for (xml, txt, html)

- file A indexing successfully - updated

- file B indexing successfully - touched

- file C indexing successfully - deleted

... - Total indexing 421 files

- touched 400 files

- updated 10 files

- deleted 11 files

- indexing time 8909 ms


To write down a guidelines for developers

One of results of the discussion is occurrence a section "Logging guidelines".

To output a logging information in the different files

Though this idea has not found support in developers, the debugging data recording in a separate file can be accepted also.

If you want to output logging information in different files it should be helpful to write a xml-configuration-file. In this way it is possible to define two or more different appender for the same logger. By using a LevelRangeFilter (which is only supported in xml) you could specify an own log level for every appender.

In this example two FileAppender were used. The first FileAppender (DebugFileAppenter) only log messages of the DEBUG level and the second FileAppender (OtherFileAppender) log all messages of more interesting level (INFO – FATAL)

After defining the appender with layout, filter and other parameter (maybe filename for FileAppender) it is necessary to reference the appender to the logger (here: RootLogger) at the end of the configuration file.

<?xml version="1.0" encoding="UTF-8" ?>
<!DOCTYPE log4j:configuration SYSTEM "log4j.dtd">
<log4j:configuration>
 
	<!--define first appender-->
	<appender name="DebugFileAppender" class="org.apache.log4j.FileAppender">
		<!--specify the output file-->
		<param name="file" value="debugLogFile.log"/>
		<!--define a layout-->
		<layout class="org.apache.log4j.PatternLayout">
			<param name="ConversionPattern" value="%d{dd MMM yyyy HH:mm:ss} - %p - %m - @%l %n"/>
		</layout>
		<!--set a filter-->
		<filter class="org.apache.log4j.varia.LevelRangeFilter">
			<param name="LevelMin" value="DEBUG"/>
			<param name="LevelMax" value="DEBUG"/>
			<param name="AcceptOnMatch" value="true"/>
		</filter>
	</appender>
 
	<!--define second appender-->
	<appender name="OtherFileAppender" class="org.apache.log4j.FileAppender">
		<!--specify the output file-->
		<param name="file" value="otherLogFile.log"/>
		<!--define a layout-->
		<layout class="org.apache.log4j.PatternLayout">
			<param name="ConversionPattern" value="%d{dd MMM yyyy HH:mm:ss} - %p - %m - @%l %n"/>
		</layout>
		<!--set a filter-->
		<filter class="org.apache.log4j.varia.LevelRangeFilter">
			<param name="LevelMin" value="INFO"/>
			<param name="LevelMax" value="FATAL"/>
			<param name="AcceptOnMatch" value="true"/>
		</filter>
	</appender>
 
	<!--reference the appender to a logger-->
	<root>
		<priority value="All"/>
		<appender-ref ref="DebugFileAppender"/>
		<appender-ref ref="OtherFileAppender"/>
	</root>
 
</log4j:configuration>

At last you invoke you xml-configuration-file over the DomConfigurator in you java code.

import org.apache.log4j.xml.DOMConfigurator;
 
public void LoggerProperties() {
    DOMConfigurator.configureAndWatch("log4j.xml");
  }


Additional logging: Records lifecycle information

The Path of a record that travels through our framework maybe seems a bit obscure. Highlighting this logging messages to get a better overview requires a new class: org.eclipse.smila.utils.log.RecordLifecycleLogHelper

This helper class uses the same Logger as all logging events, but with a predefined name (“Records”) and put id information into the log message.

To log a message regarding a record keep with this scheme.

if (RecordLifecycleLogHelper.isRecordStateLogEnabled()) {
    RecordLifecycleLogHelper.logRecordState(String message, String idHash);
  }

Methods in detail

//Logger name.
private static final String RECORDS_LOGGER = "Records";
 
//Checks if logging of record state is enabled.
public static boolean isRecordStateLogEnabled() {
    final Log log = LogFactory.getLog(RECORDS_LOGGER);
    return log.isInfoEnabled();
  }
 
//Logs record state.
public static void logRecordState(String message, String idHash) {
    final Log log = LogFactory.getLog(RECORDS_LOGGER);
    if (log.isInfoEnabled()) {
      log.info(message + ", record id=" + idHash);
    }
  }

One short log file example

2008-11-17 14:53:35,755 INFO  [Thread-14       ]  Records  - Record was crawled by dataSource=file, record id=53b573f1afa23a3c176c9567f8f
2008-11-17 14:53:35,771 INFO  [Thread-14       ]  Records  - DeltaIndexing data is new or changed, record id=53b573f1afa23a3c176c9567f8f
2008-11-17 14:53:35,927 INFO  [Thread-14       ]  Records  - Record is routed with rule [Default ADD Rule] and operation [ADD], record id=53b573f1afa23a3c176c9567f8f9a6
2008-11-17 14:53:35,958 INFO  [Thread-8        ]  Records  - Record is processed by Listener with rule: [Default ADD Rule] and operation [ADD], record id=53b573f1afa23a3c176c9567f8f9a6
2008-11-17 14:53:36,426 INFO  [ODEServerImpl-1 ]  Records  - Record added to lucene index, record id=53b573f1afa23a3c176c9567f8f9a6

Links

log4j Javadocs

Official log4j Homepage

Don't Use System.out.println! Use Log4j.