RE: libpq debug log - Mailing list pgsql-hackers

From tsunakawa.takay@fujitsu.com
Subject RE: libpq debug log
Date
Msg-id TYAPR01MB2990379993AFD6DE211E63B7FEA50@TYAPR01MB2990.jpnprd01.prod.outlook.com
Whole thread Raw
In response to RE: libpq debug log  ("iwata.aya@fujitsu.com" <iwata.aya@fujitsu.com>)
Responses RE: libpq debug log  ("kuroda.hayato@fujitsu.com" <kuroda.hayato@fujitsu.com>)
RE: libpq debug log  ("k.jamison@fujitsu.com" <k.jamison@fujitsu.com>)
Re: libpq debug log  (Alvaro Herrera <alvherre@alvh.no-ip.org>)
List pgsql-hackers
From: iwata.aya@fujitsu.com <iwata.aya@fujitsu.com>
> This patch includes these items:

Is there anything else in this revision that is not handled?

Below are my comments.

Also, why don't you try running the regression tests with a temporary modification to PQtrace() to output the trace to
afile?  The sole purpose is to confirm that this patch makes the test crash (core dump). 


(1)
-    conn->Pfdebug = debug_port;
+    if (pqTraceInit(conn))
+    {
+        conn->Pfdebug = debug_port;
+        if (conn->Pfdebug != NULL)
+            setlinebuf(conn->Pfdebug);
+    }
+    else
+    {
+        /* XXX report ENOMEM? */
+        fprintf(conn->Pfdebug, "Failed to initialize trace support\n");
+        fflush(conn->Pfdebug);
+    }
 }

* When the passed debug_port is NULL, the function should return after calling PQuntrace().

* Is setlinebuf() available on Windows?  Maybe you should use setvbuf() instead.

* I don't see the need for separate pqTraceInit() function, because it is only called here.


(2)
+bool
+pqTraceInit(PGconn *conn)
+{
+    /* already done? */
+    if (conn->be_msg != NULL)
+    {

What's this code for?  I think it's sufficient that PQuntrace() frees b_msg and PQtrace() always allocates b_msg.


(3)
+    conn->fe_msg = malloc(sizeof(pqFrontendMessage) +
+                            MAX_FRONTEND_MSGS * sizeof(pqFrontendMessageField));
+    conn->fe_msg->field_num = 0;
+    if (conn->fe_msg == NULL)
+    {
+        free(conn->be_msg);
+        /* NULL out for the case that fe_msg malloc fails */
+        conn->be_msg = NULL;
+        return false;
+    }
+    conn->fe_msg->field_num = 0;

The memory for the fields array is one entry larger than necessary.  The allocation code would be:

+    conn->fe_msg = malloc(offsetof(pqFrontendMessage, fields) +
+                            MAX_FRONTEND_MSGS * sizeof(pqFrontendMessageField));

Also, the line with "field_num = 0" appears twice.  The first one should be removed.


(4)
+static void
+pqLogMessageByte1(PGconn *conn, char v, PGCommSource commsource)
+{
...
+    if (PG_PROTOCOL_MAJOR(conn->pversion) >= 3)
+    {

I think you can remove the protocol version check in various logging functions, because PQtrace() disables logging when
theprotocol version is 2. 


(5)
@@ -966,10 +966,6 @@ pqSaveParameterStatus(PGconn *conn, const char *name, const char *value)
     pgParameterStatus *pstatus;
     pgParameterStatus *prev;

-    if (conn->Pfdebug)
-        fprintf(conn->Pfdebug, "pqSaveParameterStatus: '%s' = '%s'\n",
-                name, value);
-

Where is this information output instead?


(6)
+ * Set up state so that we can trace. NB -- this might be called mutiple

mutiple -> multiple


(7)
+    LOG_FIRST_BYTE,                /* logging the first byte identifing the
+                                   protocol message type */

identifing -> identifying


(8)
+#define pqLogMsgByte1(CONN, CH, SOURCE) \
+((CONN)->Pfdebug ? pqLogMessageByte1(CONN, CH, SOURCE) : 0)

For functions that return void, follow the style of CHECK_FOR_INTERRUPTS() defined in miscadmin.h.


(9)
+                currtime = time(NULL);
+                tmp = localtime(&currtime);
+                strftime(timestr, sizeof(timestr), "%Y-%m-%d %H:%M:%S %Z", tmp);
+
+                fprintf(conn->Pfdebug, "%s %s ", timestr, message_direction);

It's better to have microsecond accuracy, because knowing the accumulation of such fine-grained timings may help to
troubleshootheavily-loaded client cases.  You can mimic setup_formatted_log_time() in src/backend/utils/error/elog.c.
Thisis used for the %m marker in log_line_prefix. 


(10)
I think it's better to use tabs (or any other character that is less likely to appear in the log field) as a field
separator,because it makes it easier to process the log with a spreadsheet or some other tools. 


(11)
+                /*
+                 * True type of message tagged '\0' is known when next 4 bytes is
+                 * checked. So we delay printing message type to pqLogMsgInt()
+                 */
+                if (v != '\0')
+                    fprintf(conn->Pfdebug, "%s ",
+                            pqGetProtocolMsgType((unsigned char) v, commsource));

In what cases does '\0' appear as a message type?


(12)
+static const char *
+pqGetProtocolMsgType(unsigned char c, PGCommSource commsource)
+{
+    if (commsource == FROM_BACKEND && c < lengthof(protocol_message_type_b))
+        return protocol_message_type_b[c];
+    else if (commsource == FROM_FRONTEND && c < lengthof(protocol_message_type_f))
+        return protocol_message_type_f[c];
+    else
+        return "UnknownMessage";
+}

This function returns NULL (=0) when protocol_message_type_b/f[c] is 0.  That crashes the caller:

+                    fprintf(conn->Pfdebug, "%s ",
+                            pqGetProtocolMsgType((unsigned char) v, commsource));

Plus, you may as well print the invalid message type.  Why don't you do something like this:

+static const char *
+pqGetProtocolMsgType(unsigned char c, PGCommSource commsource)
+{
+    static char unknown_message[64];
+    char *msg = NULL;

+    if (commsource == FROM_BACKEND && c < lengthof(protocol_message_type_b))
+        msg = protocol_message_type_b[c];
+    else if (commsource == FROM_FRONTEND && c < lengthof(protocol_message_type_f))
+        msg = protocol_message_type_f[c];
+
+    if (msg == NULL)
+    {
+        sprintf(unknown_message, "Unknown message %02x", c);
+        msg = unknown_message;
+    }
+
+    return msg;
+}


(13)
@@ -156,7 +156,12 @@ pqParseInput3(PGconn *conn)
         {
             /* If not IDLE state, just wait ... */
             if (conn->asyncStatus != PGASYNC_IDLE)
+            {
+                /* Terminate a half-finished logging message */
+                if (conn->Pfdebug)
+                    pqTraceForcelyBreakLine(msgLength, conn);
                 return;
+            }

             /*
              * Unexpected message in IDLE state; need to recover somehow.

What's this situation like?  Why output a new line and reset the trace status?


(14)
+/* pqLogInvalidProtocol: Print that the protocol message is invalid */
+static void
+pqLogInvalidProtocol(PGconn *conn)
+{
+    fprintf(conn->Pfdebug, ":::Invalid Protocol\n");
+    conn->be_msg->state = LOG_FIRST_BYTE;
+}

Is it sufficient to just reset the state field?  Isn't it necessary to call pqTraceResetBeMsg() instead?


(15)
@@ -212,15 +368,8 @@ pqSkipnchar(size_t len, PGconn *conn)
...
     conn->inCursor += len;
-
     return 0;
 }

This is an unnecessary removal of an empty line.


(16)
+static void
+pqLogMessageByte1(PGconn *conn, char v, PGCommSource commsource)
+{
+    char       *message_direction = commsource == FROM_BACKEND ? "<" : ">";
...
+        switch (conn->be_msg->state)

This function handles messages in both directions.  But the switch condition is based on the backend message state
(b_msg). How is this correct? 


(17)
What's the difference between pqLogMsgString() and pqLogMsgnchar()?  They both take a length argument.  Do they have to
beseparate functions? 


(18)
     if (conn->Pfdebug)
-        fprintf(conn->Pfdebug, "To backend> %c\n", c);
+        pqStoreFrontendMsg(conn, LOG_BYTE1, 1);

To match the style for backend messages with pqLogMsgByte1 etc., why don't you wrap the conn->Pfdebug check in the
macro?


(19)
@@ -520,8 +667,6 @@ pqPutMsgStart(char msg_type, bool force_len, PGconn *conn)
         /* allow room for message length */
         endPos += 4;
     }
-    else
-        lenPos = -1;

Why is this change necessary?


(20)
+static void
+pqLogFrontendMsg(PGconn *conn)
...
+    for (i = 0; i < conn->fe_msg->field_num; i++)
+    {
+        message_addr = conn->fe_msg->fields[i].offset_in_buffer;
+        length = conn->fe_msg->fields[i].length;
+
+        switch (conn->fe_msg[i].type)
+        {
+            case LOG_BYTE1:

When I saw this switch condition, the confusion exploded.  Shouldn't the type should be the attribute of each field as
follows?

+        switch (conn->fe_msg->fields[i].type)

I also found the function names for frontend -> backend messages hard to grasp.  Could you show the flow of function
callswhen sending messages to the backend? 


(21)
+                uint16        result16;
+                memcpy(&result16, conn->outBuffer + message_addr, length);

You should have an empty line between the declaration part and execution part.


(22)
I couldn't understand pqLogBinaryMsg() at all.  Could you explain what it does?  Particularly, if all bytes in the
stringis printable, the function seems to do nothing: 

+        if (isprint(v[i]))
+            continue;

Should the following part be placed outside the for loop?  What does 'pin' mean? (I want the variable name to be more
intuitive.)

+        if (pin < length)
+            fwrite(v + pin, 1, length - pin, conn->Pfdebug);


Regards
Takayuki Tsunakawa




pgsql-hackers by date:

Previous
From: "tsunakawa.takay@fujitsu.com"
Date:
Subject: RE: Disable WAL logging to speed up data loading
Next
From: Justin Pryzby
Date:
Subject: Re: PoC/WIP: Extended statistics on expressions