Re: libpq debug log - Mailing list pgsql-hackers

From Alvaro Herrera
Subject Re: libpq debug log
Date
Msg-id 20210121215247.GA31809@alvherre.pgsql
Whole thread Raw
In response to RE: libpq debug log  ("tsunakawa.takay@fujitsu.com" <tsunakawa.takay@fujitsu.com>)
Responses RE: libpq debug log
List pgsql-hackers
On 2021-Jan-17, tsunakawa.takay@fujitsu.com wrote:

> * I don't see the need for separate pqTraceInit() function, because it is only called here.

That's true, but it'd require that we move PQtrace() to fe-misc.c,
because pqTraceInit() uses definitions that are private to that file.

> (2)
> +bool
> +pqTraceInit(PGconn *conn)
> +{
> +    /* already done? */
> +    if (conn->be_msg != NULL)
> +    {
> 
> What's this code for?  I think it's sufficient that PQuntrace() frees b_msg and PQtrace() always allocates b_msg.

The point is to be able to cope with a connection that calls PQtrace()
multiple times, with or without an intervening PQuntrace().
We'd make no friends if we made PQtrace() crash, or leak memory if
called a second time ...

> (5)
> @@ -966,10 +966,6 @@ pqSaveParameterStatus(PGconn *conn, const char *name, const char *value)
>      pgParameterStatus *pstatus;
>      pgParameterStatus *prev;
>  
> -    if (conn->Pfdebug)
> -        fprintf(conn->Pfdebug, "pqSaveParameterStatus: '%s' = '%s'\n",
> -                name, value);
> -
> 
> Where is this information output instead?

Hmm, seems to have been lost.  I restored it, but didn't verify
the resulting behavior carefully.

> (9)
> +                currtime = time(NULL);
> +                tmp = localtime(&currtime);
> +                strftime(timestr, sizeof(timestr), "%Y-%m-%d %H:%M:%S %Z", tmp);
> +
> +                fprintf(conn->Pfdebug, "%s %s ", timestr, message_direction);
> 
> It's better to have microsecond accuracy, because knowing the accumulation of such fine-grained timings may help to
troubleshootheavily-loaded client cases.  You can mimic setup_formatted_log_time() in src/backend/utils/error/elog.c.
Thisis used for the %m marker in log_line_prefix.
 

Yeah, it was me that removed printing of milliseconds -- Iwata-san's
original patch used ftime() which is not portable.  I had looked for
portable ways to get this but couldn't find anything that didn't involve
linking backend files, so I punted.  But looking again now, it is quite
simple: just use standard strftime and localtime and just concatenate
both together.  Similar to what setup_formatted_log_time except we don't
worry about the TZ at all.

> (10)
> I think it's better to use tabs (or any other character that is less likely to appear in the log field) as a field
separator,because it makes it easier to process the log with a spreadsheet or some other tools.
 

I can buy that.  I changed it to have some tabs; the lines are now:

timestamp "> or <" <tab> "message type" <tab> length <space> message contents

I think trying to apply tabs inside the message contents is going to be
more confusing than helpful.



> (12)
> [...]
> Plus, you may as well print the invalid message type.  Why don't you do something like this:
> 
> +static const char *
> +pqGetProtocolMsgType(unsigned char c, PGCommSource commsource)
> +{
> +    static char unknown_message[64];
> +    char *msg = NULL;
> 
> +    if (commsource == FROM_BACKEND && c < lengthof(protocol_message_type_b))
> +        msg = protocol_message_type_b[c];
> +    else if (commsource == FROM_FRONTEND && c < lengthof(protocol_message_type_f))
> +        msg = protocol_message_type_f[c];
> +
> +    if (msg == NULL)
> +    {
> +        sprintf(unknown_message, "Unknown message %02x", c);
> +        msg = unknown_message;
> +    }
> +
> +    return msg;
> +}

Right.  I had written something like this, but in the end decided not to
bother because it doesn't seem terribly important.  You can't do exactly
what you suggest, because it has the problem that you're returning a
stack-allocated variable even though your stack is about to be blown
away.  My copy had a static buffer that was malloc()ed on first use (and
if allocation fails, return a const string).  Anyway, I fixed the
crasher problem.

The protocol_message_type_b array had the serious problem it confused
the entry for byte 1 (0x01) with that of char '1' (and 2 and 3), so it
printed a lot of 'Unknown message' lines.  Clearly, the maintenance of
this array is going to be a pain point of this patch (counting number of
zeroes is no fun), and I think we're going to have some way to have an
explicit initializer, where we can do things like
    protocol_message_type_b['A'] = "NotificationResponse";
etc instead of the current way, which is messy, hard to maintain.
I'm not sure how to do that and not make things worse, however.

> (16)
> +static void
> +pqLogMessageByte1(PGconn *conn, char v, PGCommSource commsource)
> +{
> +    char       *message_direction = commsource == FROM_BACKEND ? "<" : ">";
> ...
> +        switch (conn->be_msg->state)
> 
> This function handles messages in both directions.  But the switch condition is based on the backend message state
(b_msg). How is this correct?
 

It's not.

I split things so that this function only prints backend messages; when
frontend messages are to be printed, we use a single fprintf() instead.
See about (20), below.

> 
> (17)
> What's the difference between pqLogMsgString() and pqLogMsgnchar()?  They both take a length argument.  Do they have
tobe separate functions?
 

nchar are not null-terminated and we escape !isprint() characters.  I'm
not sure that the difference is significant enough, but I'm also not
sure if they can be merged into one.

> (18)
>      if (conn->Pfdebug)
> -        fprintf(conn->Pfdebug, "To backend> %c\n", c);
> +        pqStoreFrontendMsg(conn, LOG_BYTE1, 1);
> 
> To match the style for backend messages with pqLogMsgByte1 etc., why don't you wrap the conn->Pfdebug check in the
macro?

I think these macros are useless and should be removed.  There's more
verbosity and confusion caused by them, than the clarity they provide.

> (20)
> +static void
> +pqLogFrontendMsg(PGconn *conn)
> ...
> +    for (i = 0; i < conn->fe_msg->field_num; i++)
> +    {
> +        message_addr = conn->fe_msg->fields[i].offset_in_buffer;
> +        length = conn->fe_msg->fields[i].length;
> +
> +        switch (conn->fe_msg[i].type)
> +        {
> +            case LOG_BYTE1:
> 
> When I saw this switch condition, the confusion exploded.  Shouldn't the type should be the attribute of each field
asfollows?
 
> 
> +        switch (conn->fe_msg->fields[i].type)
> 
> I also found the function names for frontend -> backend messages hard to grasp.  Could you show the flow of function
callswhen sending messages to the backend?
 

Exactly!  I was super confused about this too, and eventually decided to
rewrite the whole thing so that the 'type' is in the Fields struct.  And
while doing that, I noticed that it would make more sense to let the
fe_msg array be realloc'ed if it gets full, rather than making it fixed
size.  This made me add pqTraceUninit(), so that we can free the array
if it has grown, to avoid reserving arbitrarily large amounts of memory
after PQuntrace() on a session that traced large messages.

> (22)
> I couldn't understand pqLogBinaryMsg() at all.  Could you explain what it does?  Particularly, if all bytes in the
stringis printable, the function seems to do nothing:
 
> 
> +        if (isprint(v[i]))
> +            continue;
> 
> Should the following part be placed outside the for loop?  What does 'pin' mean? (I want the variable name to be more
intuitive.)
> 
> +        if (pin < length)
> +            fwrite(v + pin, 1, length - pin, conn->Pfdebug);

Yes :-(  I fixed that too.

This patch needs more work still, of course.

-- 
Álvaro Herrera       Valdivia, Chile
"Java is clearly an example of money oriented programming"  (A. Stepanov)

Attachment

pgsql-hackers by date:

Previous
From: Robert Haas
Date:
Subject: Re: Is Recovery actually paused?
Next
From: Heikki Linnakangas
Date:
Subject: Re: Avoiding smgrimmedsync() during nbtree index builds