Re: [JDBC] New driver logging proposal - Mailing list pgsql-jdbc

From Daniel Migowski
Subject Re: [JDBC] New driver logging proposal
Date
Msg-id 41ED3F5450C90F4D8381BC4D8DF6BBDCAF6CE94C@EXCHANGESERVER.ikoffice.de
Whole thread Raw
In response to Re: [JDBC] New driver logging proposal  (Jorge Solórzano <jorsol@gmail.com>)
Responses Re: [JDBC] New driver logging proposal  (Vladimir Sitnikov <sitnikov.vladimir@gmail.com>)
Re: New driver logging proposal  (Vladimir Sitnikov <sitnikov.vladimir@gmail.com>)
List pgsql-jdbc

Hello Jorge,

 

thanks for your critical comments. My answers are also between the lines.

 

Summary:

 

·         Leave the current logger configuration by properties as it is for users without a configured logging system.

·         Show connection IDs in the log messages to identify connections, and not in logging categories.

·         Still use custom logging wrapper for fast deactivation of logging on a connection base and to inject connection ID into log messages.

·         Enable logging by a “logging=true” configuration parameter, by default it is disabled. Non-connection-specific logging will still be enabled by default, but that’s not required because performance won’t be hurt by it.

 

Regards,

Daniel Migowski

 

Von: Jorge Solórzano [mailto:jorsol@gmail.com]
Gesendet: Samstag, 8. April 2017 20:15

On Fri, Apr 7, 2017 at 11:13 PM, Daniel Migowski <dmigowski@ikoffice.de> wrote:

this is my proposal for a new PostgreSQL JDBC Driver logging. The aim for this is to merge the efforts done in integrating common logging frameworks while keeping the performance we had before logging frameworks were introduced. Also this proposal should add the functionality to identify connections in the logging output, a feature that was lost when a switch to logging frameworks was made.

 

+1 for the functionality to identify connections, it was on my TODO list.

 

OK, then this will be done.

Current (and previous) state: 

I assume that people who do logging already use a logging framework of their choice, either because they configured it themselves or by using the one provided by the container they use. In either way they know how to configure logging on the platform they chose.

 

Sadly you should never "assume" anything of what people use, I still know people that uses System.out.println() to debug.

 

Alright, so we still need something to help those people out.

 

Before logging framework support was introduced we had a single logging related Driver parameter called “loglevel”. This could be set to “0” for zero logging, to “1” for a log level corresponding to “DEBUG” in modern logging frameworks, and to “2” for “TRACE” level logging. There was (and seemingly isn’t) a need for higher severity log levels like “INFO”, “WARN”, “ERROR” or “FATAL” because events that would cause these are thrown as exceptions in the error case while warnings and infos are not generated at all from the driver. Logout was written to a LogWriter when this was set on the driver, which would in most cases be used to redirect logout to the logging framework one had chosen.  

The newer versions of the driver allow the parameters “loggerLevel” and “loggerFile”. These choices are intended to interfere with the logging framework one uses because change the loglevel programmatically on the logging category “org.postgresql”. Also with “loggerFile” one is able to configure a special logfile for the logging category.

 

These options are to facilitate the configuration of the logger for those people that don't know or just want an easy way to enable logging.

 

OK, I will leave these options in and don’t care for people that want to use the driver that way.

 

The current implementation has the side effect that creating a single connection will change the log level of ALL connections (loaded by the same ClassLoader to be specific) because the loggers are statically defined in the classes which is how loggers are used today. Also, although can be given as connection parameters, they are not connection specific at all. This might not be a problem when one uses only a single connection pool where all connections are created equally, but interferes when one has an application that occasionally has to fetch data from multiple data sources.

 

Do you intent to initialize a logger for each connection?
You don't need to use connection parameters to use the logger, you can also use a properties file too.

 

Yes, indeed. But that might depend on how the user configures the connectionIdentifier. If connections don’t have to be identified in the output we can reuse the loggers. If connections are to be identified, each connection will get it’s own logger, because I’d like to add connection information in the logging category. But now, while I think about it, we could also just prepend a connection-identifying string to the log messages. It would be

 

    2017-04-09 09:59:24,763 [main] DEBUG org.postgresql.sql - CON-123: My logging text

 

vs.

 

    2017-04-09 09:59:24,763 [main] DEBUG org.postgresql.sql.123 - My logging text

 

in the logfiles. Personally I’d even like the message prepending more, so you are right, that should be the way to go. I didn’t know that one could configure the driver with a properties file, maybe this is useful in that case.

 

 Also, the current implementation makes have use of TRACE level logging in all getXXX functions of PgResultSet. A typical line is this:

      LOGGER.log(Level.FINEST, "  getString columnIndex: {0}", columnIndex);

This innocent looking line should do no harm at a first glance, because if tracing is disabled the function should leave immediately and  no performance is lost. One minor problem is the usage of the columnIndex as parameter, which is autoboxed into an Integer while passed to the log function (involves a hidden call to Integer.valueOf(int), which even creates new Objects depending on how aggressive the VM is tuned and on the number of columns in the result set.

 

This has to be backed up by a benchmark, the autobox should be cheap and Integer.valueOf(int) will always cache values in the range -128 to 127, inclusive.

 

Depends on the VM settings. This might even be much higher but in my testcase I had an absurdly large table with >400 columns, and there I noticed the impact. It is nonetheless not for free.

 

More important is the implementation of the log function itself, which is relative fast when a good Logging Framework is used, and disastrously slow in many other cases like when Logback is used. In that case an expensive log event is created and passed to the really used logging framework just to be discarded there, but this can IMHO be considered an artefact of a badly configured framework (still throws a bad light on PostgreSQL JDBC performance).

The logging framework currently used is java.util.logging, so if you use the Simple Logging Facade (slf4j) and redirect to another implementation using a bridge then there will be some overhead, with Logback for instance you need to configure the LevelChangePropagator. 

 

Yes, I know, and personally I am still using log4j. Even with log4j configured (which seems to do something like a level change automatically, else the performance hit would be even worse), I could feel 20% performance improvements in my benchmarks just by outcommenting the log statements. The time I measured even included doing the actual query and my local PostgreSQL server to respond.  

 

 Proposed state:

I believe the common case is that logging for the driver is disabled. For this I propose a Boolean configuration parameter “logging” that can be truly configured per connection. Except for a few places where no reference to a connection exists, this will silent at least all connection specific logging.

I'm not sure why introduce another parameter when the logLevel = OFF can be used for the same. Do you have an idea about how "truly configured per connection" can be archived? If I simply don't use the connection parameters and use the method of configuration provided by my app-server, how will that work?

 

I wouldn’t rely simply on the logging configuration to disable logging because I want to add an extra layer that turns off logging completely (on a per-connection-base). See below where I explain that. I would even make that the default because it allows the JVM to nearly completely ignore the calls to the logging framework.

 

Regarding the “logging” parameter: Currently, when configured to something else than OFF it automatically configures the logging handler for the “org.postgresql” category and does not allow to use your own logging configuration at all!

 

But maybe I want logging at a given level but still don’t want to use that for the whole driver, just for maybe the “org.postgresql.connect” category. Then I would enable logging with “logging=true” and let the user configure their log levels for themselves. I would think a good default value for “logging” is “false” when logLevel = “OFF” or logLevel is not configured, and “true” in all other cases.

 

So with my new parameter, if you want logging you can either

 

·         use logging=true and use your own logger configuration, or

·         use loggerLevel=XXX and let the driver configure logging for you, either to a file specified by “loggerFile”, or to Driver.getLogStream(), or to stderr.

 

As I said, the “current” implementation would assume logging=true, but the overhead for that is just too high, so I need a way to disable connection specific logging completely. The numbers I can provide after implementing that will prove me right.

 

 The log level and log destination should be configured by the normal means of the logging framework,

​​exactly how it is done in every other library that provides some means of logging.

 

What is the normal means of the logging framework? and exactly how it's done in every other library that provides some means of logging?

 

Just wanted to say that when logging is used you configure the log files yourself, and don’t have the library you use do it. I even don’t feel good with a library that fiddles with my carefully configured logging configs, but if this is what people want, accepted.  

 

I saw from the discussion while introducing logging frameworks that there was demand for programmatic configuration of log levels and log destinations, but since the current implementation does that on the Driver level anyway, this should be done by calling static functions on it, not by giving connection parameters (this can be discussed). The reason that connection pools can be configured easier when these parameters are available to the connection is not valid, because either the connection pool is created in a container, and then logging is configurable in the same textual way anyway and the container user is used to it, or the connection pool is created programmatically, and then one is also able to configure the driver programmatically on a driver level. Therefore we can drop the parameters “loggerLevel” and “loggerFile”, replacing them with helper methods on Driver that just configure the java.util.Logging framework.

 

The connection parameters are there to facilitate the configuration of the logger, and just for those rare cases when there is a need to enable and discard scenarios, it's not the recomended aproach and not intended to be used in production.

 

Finally it might be nice to be able to identify the connection that logs. I like the idea of ??? to encode it in the logging category.

 

Yes, it might be nice to identify the connection, what do you mean by ???

 

Damn, I should have prove read this. I meant https://github.com/vlsi.
 

Also usually when logging is used the category is the package that logs. It is common to have different logging categories to be able to filter the log messages. To be able to combine both I propose a fixed set of logging categories (<id> is the connection id):

 

    org.postgresql.connect.<id>  ß Logs connecting and disconnecting events. Can log more or less depending on Level. Level INFO logs simple connection/disconnection. Level DEBUG also logs parameters given to connection. Level TRACE logs the whole connection setup including default parameter settings. This also logs WARN messages when connections are finalized without having been closed explicitly.

    org.postgresql.sql.<id>  ß Logs the SQL queries executed on the backend. Also logs Bind, Prepare and Execute for PreparedStatements. Level INFO logs all SQL here. Level DEBUG could also log all the parameters given to statements. Level TRACE could even log the returned results.

    org.postgresql.api.<id>  ß Logs all interaction with the driver through externally callable functions.

    org.postgresql.protocol.<id>  ß Logs on the protocol level. Should always use TRACE level.

    org.postgresql.pool.<id>  ß Logs connection pool related events. On Level INFO we could print simple pool usage like retrieving or releasing connections. Maybe there is useful logging on DEBUG level but I don’t know for now.  

   org.postgresql.XXX  ß Some more internal logging categories for places where we don’t have access to the connection itself. These will always log at TRACE level and are not called very often. I will remove potentially expensive logging in place called often like in static helper functions of BooleanTypeUtil.

 

I really like the idea of have a fixed set of logging categories. +1

 The connection id could , depending on your use case, simply be an Integer used to identify the connection. This could be made optional and even configurable because maybe one wants to encode more information in it. I propose a Driver-level ConnectionIdFactory that by default just returns uses an incrementing id for the connection, but personally I’d like to have the database name in it also. Not only a ConnectionLevel Id might be used but also a DataSource id. Is also required to intentify for datasource specific logging.

 

I'm not sure if introduce a connection id it's a good idea, just for the record the logs of the Drivers should be used to debug the driver mostly, and with this it seems to fix a problem that it's not related to the Driver.


The connection ID  was meant as a way to identify the connection. As said above this does not need to be included in the category, but could also just be prepended to log message texts.

 

At least in my application I found it useful to set the log level for all third party libraries to WARN, while would also effectively discard most logging from the JDBC driver even in the case where logging is disabled. With the above categories I now could specifically describe what I am interested in, and I believe I covered most use cases one would use logging for in it. It got SQL logging for the end user, protocol logging for the driver developers, API logging for the end users and application profilers, stuff to detect connection leaks and parameters for an administrator.   ​ 

Implementation details:

To achive all this I propose to replace most of the loggers currently used with a PgLogger implementation that wraps the usual. We could have a Null Logger here that does nothing, and I hope for the VM to simply drop all calls to it in compiled code completely when we won’t ever instantiate any other implementations of PgLogger, so this will allow for peak performance in the default case. This class will have log functions for each level so we spare the Level-argument in calls to the functions.


This smells like the "custom logging abstraction" anti-pattern.

 

Yes, and it smells good in this context J. The reason for it is that the logger will contain the connection id and can use it in the log messages. AND most important, the JVM will create extremely short codepaths for the case where the Null logger is configured.
 

 Other implementations of that class really do logging. Depending on if we want connection level logging we could have individual instances of the Loggers per connection and category or just by category.

 

Do you intent to change to SLF4J?

 

No, I don’t like PostgreSQL JDBC to have dependencies to other libraries.

pgsql-jdbc by date:

Previous
From: Jorge Solórzano
Date:
Subject: Re: [JDBC] New driver logging proposal
Next
From: Philippe Marschall
Date:
Subject: [JDBC] [pgjdbc/pgjdbc] 4ab5cc: feat: support fetching a REF_CURSOR usinggetObjec...