Thread: Change in Log Format and Prepared Statements
Hi, Over the past year we have performed upgrades to both our PostgreSQL server our jdbc driver over the last few months: From PostgreSQL 8.2.10 with pg74.216.jdbc3 driver To PostgreSQL 8.2.10 with pg8.2-511.jdbc4 driver To PostgreSQL 9.1.4 with pg9.1-902.jdbc4 driver (O/S is Centos 5) When we moved from the pg74.216.jdbc3 driver to the pg8.2-511.jdbc4 we saw a distinct change in the logging output by the database. Previously with the pg74.216.jdbc3 driver we saw the statement followed by a duration: 2012-09-13 13:08:48 BST [22116] [mp_demo] [pg] LOG: statement: SELECT users.USERID, users.LOGINID, users.ORGID, users.FIRSTNAME, users.LASTNAME, users.TITLE, users.TELEPHONE, users.CELLPHONE, users.PAGER, users.FAX, users.EMAIL, users.LASTLOGIN, users.ADDRESSID, users.USERSTATUSID, users.SHARED, users.MODDATE, users.ISPROXY FROM users WHERE users.LOGINID='alex-admin'; 2012-09-13 13:08:48 BST [22116] [mp_demo] [pg] LOG: duration: 0.805 ms Now with the pg8.2-511.jdbc4 and pg9.1-902.jdbc4 drivers we see the statement issued as an execute and three durations: 2012-09-13 13:07:33 BST [21664] [mp_demo] [pg] LOG: execute <unnamed>: SELECT users.USERID, users.LOGINID, users.ORGID, users.FIRSTNAME, users.LASTNAME, users.TITLE, users.TELEPHONE, users.CELLPHONE, users.PAGER, users.FAX, users.EMAIL, users.LASTLOGIN, users.ADDRESSID, users.USERSTATUSID, users.SHARED, users.MODDATE, users.ISPROXY FROM users WHERE users.LOGINID='alex-admin' 2012-09-13 13:07:33 BST [21664] [mp_demo] [pg] LOG: duration: 0.208 ms 2012-09-13 13:07:33 BST [21664] [mp_demo] [pg] LOG: duration: 1.136 ms 2012-09-13 13:07:33 BST [21664] [mp_demo] [pg] LOG: duration: 0.156 ms Indeed we still see the former if I run the same statement direct from psql. So it looks to me as though the later drivers may be making use of prepared statements (hence the "execute"?) but why do we now see 3 durations? Also, as far as I can every single statement coming from the jdbc driver is being issued as "execute <unnamed>: ......". Even BEGIN, COMMIT and ROLLBACK statements are being issued in the same way and result in 2 or 3 duration statements. Is this normal? If so, I am not convinced that we really want to use prepared statements all the time, as it doesn't appear that statements are being repeated (repetition being my understanding of why you would want to make use of prepared statements). After looking at the documentation, I have tried using the prepareThreshold parameter on the connection string in my Tomcat server.xml (Tomcat v.6.0.29) to disable prepared statements, e.g.: url="jdbc:postgresql://servername:5434/mp_demo?prepareThreshold=0" .....but this doesn't seem to have any effect. Could anyone tell me what is going on and how to effectively control this jdbc behaviour? I am concerned the continual use of prepared statements for one-time-only statements could be a performance drain on our system. Alex Stanier. This message has been scanned for malware by SurfControl plc. www.surfcontrol.com
Yes, the new driver uses V3 protocol which uses prepared statements, named and un-named. I think the 3 durations may be due to the prepare and execute. Dave Cramer dave.cramer(at)credativ(dot)ca http://www.credativ.ca On Thu, Sep 13, 2012 at 8:47 AM, Alexander Stanier <alexander.stanier@egsgroup.com> wrote: > Hi, > > Over the past year we have performed upgrades to both our PostgreSQL server > our jdbc driver over the last few months: > > From PostgreSQL 8.2.10 with pg74.216.jdbc3 driver > To PostgreSQL 8.2.10 with pg8.2-511.jdbc4 driver > To PostgreSQL 9.1.4 with pg9.1-902.jdbc4 driver > > (O/S is Centos 5) > > When we moved from the pg74.216.jdbc3 driver to the pg8.2-511.jdbc4 we saw a > distinct change in the logging output by the database. > > Previously with the pg74.216.jdbc3 driver we saw the statement followed by a > duration: > > 2012-09-13 13:08:48 BST [22116] [mp_demo] [pg] LOG: statement: SELECT > users.USERID, users.LOGINID, users.ORGID, users.FIRSTNAME, users.LASTNAME, > users.TITLE, users.TELEPHONE, users.CELLPHONE, users.PAGER, users.FAX, > users.EMAIL, users.LASTLOGIN, users.ADDRESSID, users.USERSTATUSID, > users.SHARED, users.MODDATE, users.ISPROXY FROM users WHERE > users.LOGINID='alex-admin'; > 2012-09-13 13:08:48 BST [22116] [mp_demo] [pg] LOG: duration: 0.805 ms > > Now with the pg8.2-511.jdbc4 and pg9.1-902.jdbc4 drivers we see the > statement issued as an execute and three durations: > > 2012-09-13 13:07:33 BST [21664] [mp_demo] [pg] LOG: execute <unnamed>: > SELECT users.USERID, users.LOGINID, users.ORGID, users.FIRSTNAME, > users.LASTNAME, users.TITLE, users.TELEPHONE, users.CELLPHONE, users.PAGER, > users.FAX, users.EMAIL, users.LASTLOGIN, users.ADDRESSID, > users.USERSTATUSID, users.SHARED, users.MODDATE, users.ISPROXY FROM users > WHERE users.LOGINID='alex-admin' > 2012-09-13 13:07:33 BST [21664] [mp_demo] [pg] LOG: duration: 0.208 ms > 2012-09-13 13:07:33 BST [21664] [mp_demo] [pg] LOG: duration: 1.136 ms > 2012-09-13 13:07:33 BST [21664] [mp_demo] [pg] LOG: duration: 0.156 ms > > Indeed we still see the former if I run the same statement direct from psql. > So it looks to me as though the later drivers may be making use of prepared > statements (hence the "execute"?) but why do we now see 3 durations? > > Also, as far as I can every single statement coming from the jdbc driver is > being issued as "execute <unnamed>: ......". Even BEGIN, COMMIT and ROLLBACK > statements are being issued in the same way and result in 2 or 3 duration > statements. Is this normal? > > If so, I am not convinced that we really want to use prepared statements all > the time, as it doesn't appear that statements are being repeated > (repetition being my understanding of why you would want to make use of > prepared statements). After looking at the documentation, I have tried using > the prepareThreshold parameter on the connection string in my Tomcat > server.xml (Tomcat v.6.0.29) to disable prepared statements, e.g.: > > url="jdbc:postgresql://servername:5434/mp_demo?prepareThreshold=0" > > .....but this doesn't seem to have any effect. Could anyone tell me what is > going on and how to effectively control this jdbc behaviour? I am concerned > the continual use of prepared statements for one-time-only statements could > be a performance drain on our system. > > Alex Stanier. > > > This message has been scanned for malware by SurfControl plc. > www.surfcontrol.com > > > -- > Sent via pgsql-jdbc mailing list (pgsql-jdbc@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-jdbc
To clarify, when prepareThreshold is set to zero, the driver still uses prepared statements, but it uses only the unnamed variant, which should effectively have no overhead compared to inlining parameters. The separate log messages are due to this change. Safely passing in parameters is an important reason to use prepared statements (perhaps more so than performance), so as to protect against SQL injection. For what it's worth, based on the logs, it looks like you're not doing that; you should seriously consider making that change, especially if data like user names is coming externally.
On 09/14/2012 03:24 AM, Dave Cramer wrote: > Yes, the new driver uses V3 protocol which uses prepared statements, > named and un-named. I think the 3 durations may be due to the prepare > and execute. I noticed this as well, and wasn't entirely thrilled with the effects on logging. I wonder if Pg can be convinced to suppress the non-execution times, or merge them to report a single statement execution time. I'll poke -hackers. -- Craig Ringer
<font size="-1"><font face="Helvetica, Arial, sans-serif">Thanks Dave and Craig for you input.<br /><br /> Does anyone knowwhat the three durations specifically represent? Presumably two of them are the time to prepare and execute the statement?What's the third? Is the overall execution time the sum of the three (or two in some cases)?<br /><br /> It wouldbe useful to have one summary time for the overall execution of each statement as before. We wrote a little Java programwhich analyses log file based on a statement followed by single duration (and outputs the most frequent, the slowestand most costly queries). We could rewrite the program to cope with the new format if needs be but if we can get asingle duration it would save us some work. I don't mind using prepared statements if there really is no overhead and Ican fully understand what each part of the new log format is.<br /><br /> Alex Stanier.<br /><br /></font></font><div class="moz-cite-prefix">On17/09/2012 06:15, Craig Ringer wrote:<br /></div><blockquote cite="mid:5056B201.5020809@ringerc.id.au"type="cite">On 09/14/2012 03:24 AM, Dave Cramer wrote: <br /><blockquote type="cite">Yes,the new driver uses V3 protocol which uses prepared statements, <br /> named and un-named. I think the 3durations may be due to the prepare <br /> and execute. <br /></blockquote><br /> I noticed this as well, and wasn't entirelythrilled with the effects on logging. <br /><br /> I wonder if Pg can be convinced to suppress the non-executiontimes, or merge them to report a single statement execution time. I'll poke -hackers. <br /><br /> -- <br />Craig Ringer <br /><br /><br /><br /> This email has been scanned for viruses by Blackspider's Mail Control System. Formore information please visit <a class="moz-txt-link-freetext" href="http://www.blackspider.com">http://www.blackspider.com</a> </blockquote><br /><br /><br /><p align="center"><font style="BACKGROUND-COLOR:#ffffff">This message has been scanned for malware by SurfControl plc. </font><a href="http://www.surfcontrol.com/"><fontcolor="#000000" style="BACKGROUND-COLOR: #ffffff">www.surfcontrol.com</font></a>
<font size="-1"><font face="Helvetica, Arial, sans-serif">Thanks Maciek for your advice which I will pass back to our Javadevelopers. We are using a combination of Torque and Hibernate to access the database via jdbc. I think Hibernate passesparameters as you suggest but Torque doesn't. We are slowly moving away from Torque to Hibernate.<br /><br /> Do youknow if there is any documentation on the "unnamed variant" of prepared statements? I've read the pages on PREPARE andEXECUTE but they don't go into much depth about what happens in the background.<br /><br /> Alex Stanier.<br /><br /></font></font><divclass="moz-cite-prefix">On 13/09/2012 21:12, Maciek Sakrejda wrote:<br /></div><blockquote cite="mid:CAOtHd0Df2_L=k+OxpWgNXso2w298V=D1GY7RDfwczTta=WjaJg@mail.gmail.com"type="cite"><pre wrap="">To clarify, when prepareThresholdis set to zero, the driver still uses prepared statements, but it uses only the unnamed variant, which should effectively have no overhead compared to inlining parameters. The separate log messages are due to this change. Safely passing in parameters is an important reason to use prepared statements (perhaps more so than performance), so as to protect against SQL injection. For what it's worth, based on the logs, it looks like you're not doing that; you should seriously consider making that change, especially if data like user names is coming externally. This email has been scanned for viruses by Blackspider's Mail Control System. For more information please visit <a class="moz-txt-link-freetext" href="http://www.blackspider.com">http://www.blackspider.com</a> </pre></blockquote><br /><br /><br /><p align="center"><font style="BACKGROUND-COLOR: #ffffff">This message has been scannedfor malware by SurfControl plc. </font><a href="http://www.surfcontrol.com/"><font color="#000000" style="BACKGROUND-COLOR:#ffffff">www.surfcontrol.com</font></a>
On 09/17/2012 02:58 PM, Alexander Stanier wrote: > Thanks Maciek for your advice which I will pass back to our Java > developers. We are using a combination of Torque and Hibernate to access > the database via jdbc. I think Hibernate passes parameters as you > suggest but Torque doesn't. We are slowly moving away from Torque to > Hibernate. > > Do you know if there is any documentation on the "unnamed variant" of > prepared statements? I've read the pages on PREPARE and EXECUTE but they > don't go into much depth about what happens in the background. The v3 protocol uses something like PREPARE and EXECUTE, but not those statements as such. You want the documentation on the fe/be protocol: http://www.postgresql.org/docs/current/static/protocol.html http://www.postgresql.org/docs/current/static/protocol-overview.html#PROTOCOL-QUERY-CONCEPTS http://www.postgresql.org/docs/current/static/protocol-flow.html#PROTOCOL-FLOW-EXT-QUERY For the exact communications used by PgJDBC you're best off checking the sources. I haven't read the wire protocol / backend part yet. -- Craig Ringer
On 09/17/2012 02:45 PM, Alexander Stanier wrote: > Thanks Dave and Craig for you input. > > Does anyone know what the three durations specifically represent? > Presumably two of them are the time to prepare and execute the > statement? What's the third? Is the overall execution time the sum of > the three (or two in some cases)? I have no idea, honestly. Maybe ask on pgsql-general ? > It would be useful to have one summary time for the overall execution of > each statement as before. I strongly agree. Three separate log lines for timing, with no indication about which is which, sucks. I wonder what'd happen if a more informative log_line_prefix were added... -- Craig Ringer
Craig Ringer <ringerc@ringerc.id.au> writes: > On 09/17/2012 02:45 PM, Alexander Stanier wrote: >> Does anyone know what the three durations specifically represent? > I have no idea, honestly. Maybe ask on pgsql-general ? Parse, bind, execute phases of extended-query protocol? http://www.postgresql.org/docs/9.2/static/protocol-flow.html#PROTOCOL-FLOW-EXT-QUERY > I strongly agree. Three separate log lines for timing, with no > indication about which is which, sucks. AFAICT from looking at the backend code, they are labeled ... regards, tom lane
On 09/19/2012 02:32 PM, Tom Lane wrote: > Craig Ringer <ringerc@ringerc.id.au> writes: >> On 09/17/2012 02:45 PM, Alexander Stanier wrote: >>> Does anyone know what the three durations specifically represent? > >> I have no idea, honestly. Maybe ask on pgsql-general ? > > Parse, bind, execute phases of extended-query protocol? > > http://www.postgresql.org/docs/9.2/static/protocol-flow.html#PROTOCOL-FLOW-EXT-QUERY > >> I strongly agree. Three separate log lines for timing, with no >> indication about which is which, sucks. > > AFAICT from looking at the backend code, they are labeled ... Yes, they have command tags, and if a suitable log_line_prefix like: log_line_prefix = 'cmd=%i pid=%p ' is set they are visible: cmd=INSERT pid=4067 LOG: execute <unnamed>: INSERT INTO hold VALUES (1) cmd=INSERT pid=4067 LOG: duration: 1.454 ms cmd=PARSE pid=4067 LOG: duration: 0.066 ms cmd=PARSE pid=4067 LOG: duration: 0.028 ms cmd=BIND pid=4067 LOG: duration: 0.007 ms but that just makes things more confusing. What's with the two PARSEs? and why is the INSERT logged *before* the PARSE and BIND? Does all this verbosity really need to be displayed by `log_duration` even when `log_min_messages` is at the default? It feels like implementation detail / debug trace stuff, not something that should be visible to day to day users and admins who just want to log statement durations. They don't care if the client is using the extended query protocol or not. I think this is a bit of a usability wart. How, at the default log level, does it convey any useful information whatsoever? LOG: execute <unnamed>: INSERT INTO testrs VALUES (1) LOG: duration: 0.109 ms LOG: duration: 0.026 ms LOG: duration: 0.031 ms LOG: duration: 0.005 ms LOG: execute <unnamed>: INSERT INTO testrs VALUES (2) LOG: duration: 0.029 ms LOG: duration: 0.023 ms LOG: duration: 0.021 ms LOG: duration: 0.004 ms There's no reliable way to associate the durations with the statements in case of interleaving, and nothing that tells you what the durations are. Here's what I'd want to see as the *default: LOG: [4067] execute <unnamed>: INSERT INTO testrs VALUES (1) LOG: [4067] duration: 0.161 ms If durations are logged on a separate line, a reliable way to associate them - be it backend pid, vtxid, or whatever - needs to be logged by default, and IMO the durations need to be reported as a single overall time with the command-tag of the associated command (if shown in user's log_line_prefix). Thoughts/opinions? -- Craig Ringer
Craig Ringer <ringerc@ringerc.id.au> writes: > On 09/19/2012 02:32 PM, Tom Lane wrote: >> AFAICT from looking at the backend code, they are labeled ... > Yes, they have command tags, and if a suitable log_line_prefix like: > log_line_prefix = 'cmd=%i pid=%p ' > is set they are visible: > cmd=INSERT pid=4067 LOG: execute <unnamed>: INSERT INTO hold VALUES (1) > cmd=INSERT pid=4067 LOG: duration: 1.454 ms > cmd=PARSE pid=4067 LOG: duration: 0.066 ms > cmd=PARSE pid=4067 LOG: duration: 0.028 ms > cmd=BIND pid=4067 LOG: duration: 0.007 ms Oh. You are talking about log_duration output, which is *meant* to be terse. It's designed that way for high-query-volume installations that can't afford more log text than that. If you want something that's more easily readable, turn that off and instead set log_min_duration_statement to zero, as indeed is recommended in the documentation. > but that just makes things more confusing. What's with the two PARSEs? > and why is the INSERT logged *before* the PARSE and BIND? It's hard to be sure without seeing the more verbose logging, but I think you are looking at interleaved parsing/execution of multiple prepared statements. Because that's allowed, simplifying the logging as you suggest doesn't seem very practical. My recollection is that the current behavior of the duration-logging options was hashed out in considerable detail some years ago, and is meant to serve multiple sets of requirements. It's not going to be that easy to persuade people to change it. regards, tom lane