Re: libpq debug log - Mailing list pgsql-hackers
From | Alvaro Herrera |
---|---|
Subject | Re: libpq debug log |
Date | |
Msg-id | 20210121215247.GA31809@alvherre.pgsql Whole thread Raw |
In response to | RE: libpq debug log ("tsunakawa.takay@fujitsu.com" <tsunakawa.takay@fujitsu.com>) |
Responses |
RE: libpq debug log
|
List | pgsql-hackers |
On 2021-Jan-17, tsunakawa.takay@fujitsu.com wrote: > * I don't see the need for separate pqTraceInit() function, because it is only called here. That's true, but it'd require that we move PQtrace() to fe-misc.c, because pqTraceInit() uses definitions that are private to that file. > (2) > +bool > +pqTraceInit(PGconn *conn) > +{ > + /* already done? */ > + if (conn->be_msg != NULL) > + { > > What's this code for? I think it's sufficient that PQuntrace() frees b_msg and PQtrace() always allocates b_msg. The point is to be able to cope with a connection that calls PQtrace() multiple times, with or without an intervening PQuntrace(). We'd make no friends if we made PQtrace() crash, or leak memory if called a second time ... > (5) > @@ -966,10 +966,6 @@ pqSaveParameterStatus(PGconn *conn, const char *name, const char *value) > pgParameterStatus *pstatus; > pgParameterStatus *prev; > > - if (conn->Pfdebug) > - fprintf(conn->Pfdebug, "pqSaveParameterStatus: '%s' = '%s'\n", > - name, value); > - > > Where is this information output instead? Hmm, seems to have been lost. I restored it, but didn't verify the resulting behavior carefully. > (9) > + currtime = time(NULL); > + tmp = localtime(&currtime); > + strftime(timestr, sizeof(timestr), "%Y-%m-%d %H:%M:%S %Z", tmp); > + > + fprintf(conn->Pfdebug, "%s %s ", timestr, message_direction); > > It's better to have microsecond accuracy, because knowing the accumulation of such fine-grained timings may help to troubleshootheavily-loaded client cases. You can mimic setup_formatted_log_time() in src/backend/utils/error/elog.c. Thisis used for the %m marker in log_line_prefix. Yeah, it was me that removed printing of milliseconds -- Iwata-san's original patch used ftime() which is not portable. I had looked for portable ways to get this but couldn't find anything that didn't involve linking backend files, so I punted. But looking again now, it is quite simple: just use standard strftime and localtime and just concatenate both together. Similar to what setup_formatted_log_time except we don't worry about the TZ at all. > (10) > I think it's better to use tabs (or any other character that is less likely to appear in the log field) as a field separator,because it makes it easier to process the log with a spreadsheet or some other tools. I can buy that. I changed it to have some tabs; the lines are now: timestamp "> or <" <tab> "message type" <tab> length <space> message contents I think trying to apply tabs inside the message contents is going to be more confusing than helpful. > (12) > [...] > Plus, you may as well print the invalid message type. Why don't you do something like this: > > +static const char * > +pqGetProtocolMsgType(unsigned char c, PGCommSource commsource) > +{ > + static char unknown_message[64]; > + char *msg = NULL; > > + if (commsource == FROM_BACKEND && c < lengthof(protocol_message_type_b)) > + msg = protocol_message_type_b[c]; > + else if (commsource == FROM_FRONTEND && c < lengthof(protocol_message_type_f)) > + msg = protocol_message_type_f[c]; > + > + if (msg == NULL) > + { > + sprintf(unknown_message, "Unknown message %02x", c); > + msg = unknown_message; > + } > + > + return msg; > +} Right. I had written something like this, but in the end decided not to bother because it doesn't seem terribly important. You can't do exactly what you suggest, because it has the problem that you're returning a stack-allocated variable even though your stack is about to be blown away. My copy had a static buffer that was malloc()ed on first use (and if allocation fails, return a const string). Anyway, I fixed the crasher problem. The protocol_message_type_b array had the serious problem it confused the entry for byte 1 (0x01) with that of char '1' (and 2 and 3), so it printed a lot of 'Unknown message' lines. Clearly, the maintenance of this array is going to be a pain point of this patch (counting number of zeroes is no fun), and I think we're going to have some way to have an explicit initializer, where we can do things like protocol_message_type_b['A'] = "NotificationResponse"; etc instead of the current way, which is messy, hard to maintain. I'm not sure how to do that and not make things worse, however. > (16) > +static void > +pqLogMessageByte1(PGconn *conn, char v, PGCommSource commsource) > +{ > + char *message_direction = commsource == FROM_BACKEND ? "<" : ">"; > ... > + switch (conn->be_msg->state) > > This function handles messages in both directions. But the switch condition is based on the backend message state (b_msg). How is this correct? It's not. I split things so that this function only prints backend messages; when frontend messages are to be printed, we use a single fprintf() instead. See about (20), below. > > (17) > What's the difference between pqLogMsgString() and pqLogMsgnchar()? They both take a length argument. Do they have tobe separate functions? nchar are not null-terminated and we escape !isprint() characters. I'm not sure that the difference is significant enough, but I'm also not sure if they can be merged into one. > (18) > if (conn->Pfdebug) > - fprintf(conn->Pfdebug, "To backend> %c\n", c); > + pqStoreFrontendMsg(conn, LOG_BYTE1, 1); > > To match the style for backend messages with pqLogMsgByte1 etc., why don't you wrap the conn->Pfdebug check in the macro? I think these macros are useless and should be removed. There's more verbosity and confusion caused by them, than the clarity they provide. > (20) > +static void > +pqLogFrontendMsg(PGconn *conn) > ... > + for (i = 0; i < conn->fe_msg->field_num; i++) > + { > + message_addr = conn->fe_msg->fields[i].offset_in_buffer; > + length = conn->fe_msg->fields[i].length; > + > + switch (conn->fe_msg[i].type) > + { > + case LOG_BYTE1: > > When I saw this switch condition, the confusion exploded. Shouldn't the type should be the attribute of each field asfollows? > > + switch (conn->fe_msg->fields[i].type) > > I also found the function names for frontend -> backend messages hard to grasp. Could you show the flow of function callswhen sending messages to the backend? Exactly! I was super confused about this too, and eventually decided to rewrite the whole thing so that the 'type' is in the Fields struct. And while doing that, I noticed that it would make more sense to let the fe_msg array be realloc'ed if it gets full, rather than making it fixed size. This made me add pqTraceUninit(), so that we can free the array if it has grown, to avoid reserving arbitrarily large amounts of memory after PQuntrace() on a session that traced large messages. > (22) > I couldn't understand pqLogBinaryMsg() at all. Could you explain what it does? Particularly, if all bytes in the stringis printable, the function seems to do nothing: > > + if (isprint(v[i])) > + continue; > > Should the following part be placed outside the for loop? What does 'pin' mean? (I want the variable name to be more intuitive.) > > + if (pin < length) > + fwrite(v + pin, 1, length - pin, conn->Pfdebug); Yes :-( I fixed that too. This patch needs more work still, of course. -- Álvaro Herrera Valdivia, Chile "Java is clearly an example of money oriented programming" (A. Stepanov)
Attachment
pgsql-hackers by date: