Re: [HACKERS] statement logging / extended query protocol issues - Mailing list pgsql-patches

From Bruce Momjian
Subject Re: [HACKERS] statement logging / extended query protocol issues
Date
Msg-id 200509230154.j8N1sVi17799@candle.pha.pa.us
Whole thread Raw
Responses Re: [HACKERS] statement logging / extended query protocol issues  (Simon Riggs <simon@2ndquadrant.com>)
List pgsql-patches
Here are comments on this posting and the patch (it is long):

Simon Riggs wrote:
> On Thu, 2005-09-08 at 13:14 +1200, Oliver Jowett wrote:
> > Simon Riggs wrote:
> > > Oliver, would it be possible to show a simplified call sequence and what
> > > you would like to see logged for each call?
>
> These are good: Maybe it should even be in the docs for the driver?
> It would be good if it could be written as a test within the driver, so
> we can expand it and test the logging.
>
> > The JDBC driver generates one of these sequences:
> >
> > (1)
> >   Parse (unnamed statement) "SELECT 1"
> >   Bind (unnamed statement -> unnamed portal)
> >   Execute (unnamed portal, no row limit)
> >
> > (2)
> >   Parse (named statement S_1) "SELECT 1"
> >   repeatedly:
> >     Bind (named statement S_1 -> unnamed portal)
> >     Execute (unnamed portal, no row limit)
> >
> > (3)
> >   Parse (named statement S_2) "SELECT 1"
> >   repeatedly:
> >     Bind (named statement S_2 -> named portal C_2)
> >     repeatedly:
> >       Execute (named portal C_2, row limit 42)
>
> Are we sure there is just 3 cases?
>
> > Ideal output is:
> >
> > (1)  LOG: statement: SELECT 1
> >
> > (2)  LOG: statement: SELECT 1
> >      LOG: statement: SELECT 1
> >      LOG: statement: SELECT 1
> >
> > (3)  LOG: statement: SELECT 1
> >      LOG: statement: SELECT 1
> >      LOG: statement: SELECT 1
> >
> > In case (3), that's one log line per repeat of the outer loop,
> > regardless of how many Executes are sent in the inner loop.
>
> > Note that case (1) is the most common case for application queries via
> > the JDBC driver, and case (2) is the most common for internally
> > generated statements like BEGIN.
>
> Even if case (3) is not that common, I still want to know it is
> occurring, to see what effect or overhead it has.

Agreed.

> > As you can see from the output I'd like, I don't think that synthesizing
> > FETCH / EXECUTE queries that don't actually exist [is a]
> > useful thing to do, at least at the Joe Average User
> > level.
>
> Your original point at the top of this thread was valid: a get-next-rows
> shouldn't look like a re-execute. We can call it something else if you
> like, as long as we can tell the difference.
>
> We'll only see the output for case (3) when someone has programmed
> things that way by using setFetchSize.
>
> > Also note that the JDBC driver doesn't exercise all of the extended
> > protocol -- for example it's possible to re-Bind the same unnamed
> > statement repeatedly, or have multiple Executes on an unnamed portal
> > with a row limit, but the JDBC driver never does that.
>
> I agree there's not much gained from displaying the BIND statement as it
> is. I argued previously against including the BIND parameters. Now I
> would say we should either include them or leave out BIND altogether.

Agreed.  We should output BIND parameters at some point.  We do log
client-side PREPARE parameters now.

> Here's a new suggestion and patch that brings together
> - Oliver and Simon's wish to remove BIND from normal logging
> - Oliver's suggestion to remove the PREPARE logging for unnamed
> statements, which would otherwise double-up logging for case(1)
> - Bruce and Simon's view to keep some form of FETCH logging
> - Tom's view to rationalise the way ROWS is mentioned
>
> (lines beginning jdbc don't show in the log, but are just there to show
> clearly the time sequence of activities and what gets subsequently
> logged)
>
> (1)
> >   Parse (unnamed statement) "SELECT * from pg_proc"
> >   Bind (unnamed statement -> unnamed portal)
> >   Execute (unnamed portal, no row limit)
>
> (1)
> jdbc parse
> jdbc bind
> jdbc execute
> LOG:  statement: SELECT * from pg_proc
>
> jdbc parse
> jdbc bind
> jdbc execute
> LOG:  statement: SELECT * from pg_proc
>
> jdbc parse
> jdbc bind
> jdbc execute
> LOG:  statement: SELECT * from pg_proc
>
>
> Notice that the parse of the unnamed statement does *not* now generate a
> log record.

What is the reason for this?  I am not so worried about what JDBC uses
but what the protocol supports.  From my reading of the documentatation
is seems to support a single prepare and multiple executes of an unnamed
statement.

> (2)
> >   Parse (named statement S_1) "SELECT * from pg_proc"
> >   repeatedly:
> >     Bind (named statement S_1 -> unnamed portal)
> >     Execute (unnamed portal, no row limit)
>
> (2)
> jdbc parse S_1
> LOG:  statement: PREPARE S_1 AS SELECT * from pg_proc
> (perhaps this should be logged at BIND time, just like the
> optimization?)
>
> jdbc bind S_1
> jdbc execute
> LOG:  statement: EXECUTE <unnamed> [PREPARE:  SELECT * from pg_proc]
>
> jdbc bind S_1
> jdbc execute
> LOG:  statement: EXECUTE <unnamed> [PREPARE:  SELECT * from pg_proc]
>
> jdbc bind S_1
> jdbc execute
> LOG:  statement: EXECUTE <unnamed> [PREPARE:  SELECT * from pg_proc]
>
>
> ...I wonder if <unnamed> just confuses what is going on here? I've left
> it in for now, but suggest that we take that out again?

What do you want to replace it with?  A missing name seems confusing.

> (3)
> >   Parse (named statement S_2) "SELECT * from pg_proc"
> >   repeatedly:
> >     Bind (named statement S_2 -> named portal C_2)
> >     repeatedly:
> >       Execute (named portal C_2, row limit 42)
>
> (3)
> jdbc prepare S_2
> LOG:  statement: PREPARE S_2 AS SELECT * from pg_proc
>
> jdbc bind S_2 to C_2
> jdbc execute C_2
> LOG:  statement: EXECUTE C_2 ROWS 42 [PREPARE:  SELECT * from pg_proc]
> jdbc next (after cache has run out on 42nd row)
> v3 protocol sends E for Execute, execution halts at 49 rows for this set
> of bind parameters
> LOG:  statement: FETCH C_2 ROWS 7
>
> jdbc bind S_2 to C_2
> jdbc execute C_2
> LOG:  statement: EXECUTE C_2 ROWS 42 [PREPARE:  SELECT * from pg_proc]
> jdbc next (after cache has run out on 42nd row)
> v3 protocol sends E for Execute
> LOG:  statement: FETCH C_2 ROWS 42
> jdbc next (after cache has run out on 84th row)
> v3 protocol sends E for Execute, execution halts at 95 rows for this set
> of bind parameters
> LOG:  statement: FETCH C_2 ROWS 11
>
> Note: log_min_duration_statement logs after execution so can give
> accurate row counts of what was retrieved for first execute and
> subsequent fetches. In that case we log using the word ROWS.
> log_statement=all logs before execution and so only knows what the
> maximum number of rows requested is, not what the actual number of rows
> retrieved will be. In that case we log using the word MAXROWS.
> ROWS and MAXROWS are *not* mentioned unless we specifically set max_rows
> in the execute request using the v3 protocol.
>
> If we agree, I'd suggest this goes into the docs...
>
> I've not written a comprehensive test program that covers all of the
> different settings of v2/v3, named/unnamed, allrows/restricted rows,
> log_statement=all/log_min_duration_statement. I'm not sure that will fit
> within the existing test framework. So this patch is still prototype.

Here are my comments on the patch:

> Index: src/backend/tcop/postgres.c
> ===================================================================
> RCS file: /projects/cvsroot/pgsql/src/backend/tcop/postgres.c,v
> retrieving revision 1.458
> diff -c -c -r1.458 postgres.c
> *** src/backend/tcop/postgres.c 2 Sep 2005 21:50:54 -0000   1.458
> --- src/backend/tcop/postgres.c 9 Sep 2005 10:06:46 -0000
> ***************
> *** 1162,1173 ****
>     if (save_log_statement_stats)
>         ResetUsage();
>
> -   if (log_statement == LOGSTMT_ALL)
> -       ereport(LOG,
> -               (errmsg("statement: PREPARE %s AS %s",
> -                       (*stmt_name != '\0') ? stmt_name : "<unnamed>",
> -                       query_string)));
> -
>     /*
>      * Start up a transaction command so we can run parse analysis etc.
>      * (Note that this will normally change current memory context.)
> --- 1162,1167 ----
> ***************
> *** 1217,1222 ****
> --- 1211,1222 ----
>
>     QueryContext = CurrentMemoryContext;
>
> +
> +     /* Log parse only when using named statements, unless in debug1 mode */
> +       if (log_statement == LOGSTMT_ALL)
> +           ereport((is_named ? DEBUG1 : LOG),
> +               (errmsg("statement: PREPARE %s AS %s", stmt_name, query_string)));
> +

This seems backwards.  The comments says log only named statements, but
the '?' test makes named statements DEBUG.  Is this intended?

Why have you dropped the "<unnamed>"?  Whether it is DEBUG or not we
should display meaningful output.

>     /*
>      * Do basic parsing of the query or queries (this should be safe even
>      * if we are in aborted transaction state!)
> ***************
> *** 1467,1473 ****
>         portal = CreatePortal(portal_name, false, false);
>
>     if (log_statement == LOGSTMT_ALL)
> !       ereport(LOG,
>                 (errmsg("statement: <BIND> %s", portal_name)));
>
>     /*
> --- 1467,1473 ----
>         portal = CreatePortal(portal_name, false, false);
>
>     if (log_statement == LOGSTMT_ALL)
> !       ereport(DEBUG1,
>                 (errmsg("statement: <BIND> %s", portal_name)));

Can we get the BIND parameters here?  They seem significant for
debugging.

>     /*
> ***************
> *** 1683,1688 ****
> --- 1683,1689 ----
>     bool        save_log_duration = log_duration;
>     int         save_log_min_duration_statement = log_min_duration_statement;
>     bool        save_log_statement_stats = log_statement_stats;
> +     bool        subsequent_fetch = false;

Can we call this "execute_is_fetch".  That seems clearer.

>
>     /* Adjust destination to tell printtup.c what to do */
>     dest = whereToSendOutput;
> ***************
> *** 1695,1700 ****
> --- 1696,1710 ----
>                 (errcode(ERRCODE_UNDEFINED_CURSOR),
>                  errmsg("portal \"%s\" does not exist", portal_name)));
>
> +     /*
> +      * If we re-issue an Execute protocol request against an existing
> +      * portal, then we are only fetching more rows rather than
> +      * completely re-executing the query from the start. atStart is never
> +      * reset for a v3 portal, so we are safe to use this check.
> +      */
> +     if (!portal->atStart)
> +         subsequent_fetch = true;
> +
>     /*
>      * If the original query was a null string, just return
>      * EmptyQueryResponse.
> ***************
> *** 1706,1712 ****
>         return;
>     }
>
> !   if (portal->sourceText)
>     {
>         debug_query_string = portal->sourceText;
>         pgstat_report_activity(portal->sourceText);
> --- 1716,1727 ----
>         return;
>     }
>
> !     if (subsequent_fetch)
> !     {
> !       debug_query_string = "fetch message";
> !       pgstat_report_activity("<FETCH>");
> !     }
> !     else if (portal->sourceText)

Shouldn't we supply the name of the prepare here, or don't we have one?
If not, we should call it "unnamed".

>     {
>         debug_query_string = portal->sourceText;
>         pgstat_report_activity(portal->sourceText);
> ***************
> *** 1731,1742 ****
>     if (save_log_statement_stats)
>         ResetUsage();
>
>     if (log_statement == LOGSTMT_ALL)
> !       /* We have the portal, so output the source query. */
> !       ereport(LOG,
> !               (errmsg("statement: EXECUTE %s  [PREPARE:  %s]",
> !                       (*portal_name != '\0') ? portal_name : "<unnamed>",
>                         portal->sourceText ? portal->sourceText : "")));
>
>     BeginCommand(portal->commandTag, dest);
>
> --- 1746,1774 ----
>     if (save_log_statement_stats)
>         ResetUsage();
>
> +     /*
> +      * log statement details - when using log_statement=all we log the
> +      * phrase MAXROWS because this occurs prior to execution
> +      */
>     if (log_statement == LOGSTMT_ALL)
> !     {
> !         if (subsequent_fetch)
> !           ereport(LOG,
> !               (errmsg("statement: FETCH %s MAXROWS %ld", portal_name,
> !                         max_rows)));
> !         else
> !           /* We have the portal, so output the source query. */
> !             if (max_rows <= 0)
> !               ereport(LOG,
> !                   (errmsg("statement: EXECUTE %s [PREPARE:  %s]",
> !                         portal_name,
>                         portal->sourceText ? portal->sourceText : "")));
> +             else
> +                   ereport(LOG,
> +                   (errmsg("statement: EXECUTE %s MAXROWS %ld [PREPARE:  %s]",
> +                         portal_name, max_rows,
> +                       portal->sourceText ? portal->sourceText : "")));
> +     }

MAXROWS seems useful.  Good.

>     BeginCommand(portal->commandTag, dest);
>
> ***************
> *** 1865,1877 ****
>         if (save_log_min_duration_statement == 0 ||
>             (save_log_min_duration_statement > 0 &&
>              usecs >= save_log_min_duration_statement * 1000))
> !           ereport(LOG,
> !                   (errmsg("duration: %ld.%03ld ms  statement: EXECUTE %s  [PREPARE:  %s]",
>                         (long) ((stop_t.tv_sec - start_t.tv_sec) * 1000 +
>                               (stop_t.tv_usec - start_t.tv_usec) / 1000),
>                         (long) (stop_t.tv_usec - start_t.tv_usec) % 1000,
> !                           (*portal_name != '\0') ? portal_name : "<unnamed>",
> !                           portal->sourceText ? portal->sourceText : "")));
>     }
>
>     if (save_log_statement_stats)
> --- 1897,1929 ----
>         if (save_log_min_duration_statement == 0 ||
>             (save_log_min_duration_statement > 0 &&
>              usecs >= save_log_min_duration_statement * 1000))
> !         {
> !             if (subsequent_fetch)

Should the duration be snprintf into a local buffer and then used in
each ereport, rather than duplicating the code three times?

> !               ereport(LOG,
> !                   (errmsg("duration: %ld.%03ld ms  statement: FETCH %s ROWS %ld ",
>                         (long) ((stop_t.tv_sec - start_t.tv_sec) * 1000 +
>                               (stop_t.tv_usec - start_t.tv_usec) / 1000),
>                         (long) (stop_t.tv_usec - start_t.tv_usec) % 1000,
> !                           portal_name,
> !                             max_rows)));
> !             else
> !                 if (completed)
> !                   ereport(LOG,
> !                       (errmsg("duration: %ld.%03ld ms  statement: EXECUTE %s [PREPARE:  %s]",
> !                           (long) ((stop_t.tv_sec - start_t.tv_sec) * 1000 +
> !                                 (stop_t.tv_usec - start_t.tv_usec) / 1000),
> !                           (long) (stop_t.tv_usec - start_t.tv_usec) % 1000,
> !                               portal_name,
> !                               portal->sourceText ? portal->sourceText : "")));
> !                 else
> !                   ereport(LOG,
> !                       (errmsg("duration: %ld.%03ld ms  statement: EXECUTE %s ROWS %ld [PREPARE:  %s]",
> !                           (long) ((stop_t.tv_sec - start_t.tv_sec) * 1000 +
> !                                 (stop_t.tv_usec - start_t.tv_usec) / 1000),
> !                           (long) (stop_t.tv_usec - start_t.tv_usec) % 1000,
> !                               portal_name, max_rows,
> !                               portal->sourceText ? portal->sourceText : "")));
> !         }
>     }
>
>     if (save_log_statement_stats)

--
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073

pgsql-patches by date:

Previous
From: Tatsuo Ishii
Date:
Subject: Re: [HACKERS] Proposed patch to clean up signed-ness warnings
Next
From: Bruce Momjian
Date:
Subject: Re: small changes to autovacuum maintainance