Thread: libpq debug log

libpq debug log

From
"Iwata, Aya"
Date:
Hi,

I'm going to propose libpq debug log for analysis of queries on the application side.
I think that it is useful to determine whether the cause is on the application side or the server side when a slow
queryoccurs. 
 

The provided information is "date and time" at which execution of processing is started, "query", "application side
processing",which is picked up information from PQtrace(), and "connection id", which is for uniquely identifying the
connection.
 

To collect the log, set the connection string or environment variable.
- logdir or PGLOGDIR : directory where log file written
- logsize or PGLOGSIZE : maximum log size

What do you think about this?  Do you think that such feature is necessary?

Regards, 
Aya Iwata



Re: libpq debug log

From
Tom Lane
Date:
"Iwata, Aya" <iwata.aya@jp.fujitsu.com> writes:
> I'm going to propose libpq debug log for analysis of queries on the application side.
> I think that it is useful to determine whether the cause is on the application side or the server side when a slow
queryoccurs.  

Hm, how will you tell that really?  And what's the advantage over the
existing server-side query logging capability?

> The provided information is "date and time" at which execution of processing is started, "query", "application side
processing",which is picked up information from PQtrace(), and "connection id", which is for uniquely identifying the
connection. 

PQtrace() is utterly useless for anything except debugging libpq
internals, and it's not tremendously useful even for that.  Don't
bother with that part.

Where will you get a "unique connection id" from?

How will you deal with asynchronously-executed queries --- either
the PQgetResult style, or the single-row-at-a-time style?

            regards, tom lane


Re: libpq debug log

From
Yugo Nagata
Date:
On Fri, 24 Aug 2018 04:38:22 +0000
"Iwata, Aya" <iwata.aya@jp.fujitsu.com> wrote:

> Hi,
> 
> I'm going to propose libpq debug log for analysis of queries on the application side.
> I think that it is useful to determine whether the cause is on the application side or the server side when a slow
queryoccurs. 
 

Do you mean you want to monitor the protocol message exchange at
the client side to analyze performance issues, right?  Actually,
this might be useful to determin where is the problem, for example,
the client application, the backend PostgreSQL, or somewhere in the
network.

Such logging can be implemented in the application, but if libpq
provides the standard way, it would be helpful to resolve a problem
without modifying the application code.

> The provided information is "date and time" at which execution of processing is started, "query", "application side
processing",which is picked up information from PQtrace(), and "connection id", which is for uniquely identifying the
connection.
 

I couldn't image how this is like. Could you show us a sample of log lines
in your head?

> To collect the log, set the connection string or environment variable.
> - logdir or PGLOGDIR : directory where log file written
> - logsize or PGLOGSIZE : maximum log size

How we can specify the log file name?  What should happen if a file size
exceeds to PGLOGSIZE?

Regards,
-- 
Yugo Nagata <nagata@sraoss.co.jp>


Re: libpq debug log

From
Robert Haas
Date:
On Fri, Aug 24, 2018 at 9:48 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> PQtrace() is utterly useless for anything except debugging libpq
> internals, and it's not tremendously useful even for that.  Don't
> bother with that part.

I think that improving the output format could help with that a lot.
What it current produces is almost unreadable; adjusting it to emit
one line per protocol message would, I think, help a lot.  There are
probably other improvements that could be made at the same time.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: libpq debug log

From
Tom Lane
Date:
Robert Haas <robertmhaas@gmail.com> writes:
> On Fri, Aug 24, 2018 at 9:48 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> PQtrace() is utterly useless for anything except debugging libpq
>> internals, and it's not tremendously useful even for that.  Don't
>> bother with that part.

> I think that improving the output format could help with that a lot.
> What it current produces is almost unreadable; adjusting it to emit
> one line per protocol message would, I think, help a lot.  There are
> probably other improvements that could be made at the same time.

I wouldn't mind throwing it out and reimplementing it ;-) ... tracing
at the logical message level rather than the byte level would help.
But still, I'm not really convinced that it has very much to do with
what you'd want in a user-level debug log.

Part of what's really bad about the PQtrace output is that in v2 protocol
the same output can be repeated several times as we try to parse a message
and conclude we don't have it all yet.  I believe that problem is gone
in v3, but it may be hard to do a consistent redesign until we nuke libpq's
v2 support.  Still, it might be past time for the latter, seeing that
we no longer promise pre-7.4 compatibility in either psql or pg_dump.

            regards, tom lane


RE: libpq debug log

From
"Iwata, Aya"
Date:
> "Iwata, Aya" <iwata.aya@jp.fujitsu.com> writes:
> > I'm going to propose libpq debug log for analysis of queries on the application
> side.
> > I think that it is useful to determine whether the cause is on the application
> side or the server side when a slow query occurs.
> 
> Hm, how will you tell that really?  And what's the advantage over the existing
> server-side query logging capability?

The log I would like to propose is used when the performance issue happen, 
system administrator knows the process of application internally and check if there is any problem.
"debug" is not the correct description of the feature. The correct one should be "trace". Should I create another
thread?


> > The provided information is "date and time" at which execution of processing
> is started, "query", "application side processing", which is picked up
> information from PQtrace(), and "connection id", which is for uniquely
> identifying the connection.
> 
> PQtrace() is utterly useless for anything except debugging libpq internals,
> and it's not tremendously useful even for that.  Don't bother with that part.

My initial intention was to get only useful information from PQTrace () since it acquires a lot of information.
Is there another way to obtain application side information besides PQTrace() ?

> Where will you get a "unique connection id" from?

When collecting trace log file in the application side, 
I think it is necessary to identify different connection.
In order to do this, when new empty PQconn structure is created, new connection id is also created.  
Then we output it in the trace log file for one application.

> How will you deal with asynchronously-executed queries --- either the
> PQgetResult style, or the single-row-at-a-time style?

In my understanding, PQgetResult style outputs logs of multiple result queries at once,
While the single-row-at-a-time style outputs log for each query. Is this correct?
I think PQgetResult style is better,
because this style traces the internal process of the application.

Regards,
Aya Iwata




RE: libpq debug log

From
"Iwata, Aya"
Date:
> > I'm going to propose libpq debug log for analysis of queries on the application
> side.
> > I think that it is useful to determine whether the cause is on the application
> side or the server side when a slow query occurs.
> 
> Do you mean you want to monitor the protocol message exchange at the client
> side to analyze performance issues, right?  Actually, this might be useful to
> determin where is the problem, for example, the client application, the backend
> PostgreSQL, or somewhere in the network.
> 
> Such logging can be implemented in the application, but if libpq provides the
> standard way, it would be helpful to resolve a problem without modifying the
> application code.

Since I'd like to monitor the information the server and the client exchange,
I think monitoring protocol messages is good.

When a slow query is occurs, we check this client side trace log.
The purpose of this log acquisition I thought is to identify where is the problem: 
server side, application side or traffic. 
And if the problem is in application side, checking the trace log to identify what is the problem.

> > The provided information is "date and time" at which execution of processing
> is started, "query", "application side processing", which is picked up
> information from PQtrace(), and "connection id", which is for uniquely
> identifying the connection.
> 
> I couldn't image how this is like. Could you show us a sample of log lines in
> your head?

I am roughly thinking as follows;

...
START : 2018/09/03 18:16:35.357 CONNECTION(1)
STATUS : Connection
SEND MESSAGE : 2018/09/03 18:16:35.358
<information send to backend...>
RECEIVE MESSAGE : 2018/09/03 18:16:35.359
<information receive from backend...>
END : 2018/09/03 18:16:35.360
...
START : 2018/09/03 18:16:35.357 CONNECTION(1)
QUERY : DECLARE myportal CURSOR FOR select * from pg_database
SEND MESSAGE : 2018/09/03 18:16:35.358
<information send to backend...>
RECEIVE MESSAGE : 2018/09/03 18:16:35.359
<information receive from backend...>
END : 2018/09/03 18:16:35.360
...

> > To collect the log, set the connection string or environment variable.
> > - logdir or PGLOGDIR : directory where log file written
> > - logsize or PGLOGSIZE : maximum log size
> 
> How we can specify the log file name?  What should happen if a file size exceeds
> to PGLOGSIZE?

The log file name is determined as follow.
libpq-%ApplicationName-%Y-%m-%d_%H%M%S.log

When the log file size exceeds to PGLOGSIZE, the log is output to another file.

Regards,
Aya Iwata



Re: libpq debug log

From
Tom Lane
Date:
"Iwata, Aya" <iwata.aya@jp.fujitsu.com> writes:
> The purpose of this log acquisition I thought is to identify where is the problem:
> server side, application side or traffic.

TBH, I think the sort of logging you're proposing would be expensive
enough that *it* would be the bottleneck in a lot of cases.  A lot
of people find that the existing server-side "log_statement" support
is too expensive to keep turned on in production --- and that logs only
received SQL queries, not the returned data, and certainly not every
message passed over the wire.

            regards, tom lane


Re: libpq debug log

From
Peter Eisentraut
Date:
On 04/09/2018 02:29, Iwata, Aya wrote:
> Since I'd like to monitor the information the server and the client exchange,
> I think monitoring protocol messages is good.
> 
> When a slow query is occurs, we check this client side trace log.
> The purpose of this log acquisition I thought is to identify where is the problem: 
> server side, application side or traffic. 
> And if the problem is in application side, checking the trace log to identify what is the problem.

Between perf/systemtap/dtrace and wireshark, you can already do pretty
much all of that.  Have you looked at those and found anything missing?

-- 
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


RE: libpq debug log

From
"Iwata, Aya"
Date:
Let me explain this trace log in a bit more detail.

This log is not turned on in the system by default. 
Turn it on when an issue occurs and you want to check the information in the application in order to clarify the
cause.

I will present three use cases for this log.

1. Confirmation on cause of out-of-memory
We assume that Out-of-memory occurred in the process of storing the data received from the database.
However, the contents or length of the data is not known.
A trace log is obtained to find these out and what kind of data you have in which part (i.e. query result when
receivingfrom database).
 

2. Protocol error confirmation
When there is an error in the protocol transmitted from the client and an error occurs in the database server, the
protocolsent by the client is checked.
 
When the network is unstable, log is checked whether the database server is receiving protocol messages. 

3. Processing delay survey
If the processing in the application is slow and the processing in the database is fast, investigate the cause of the
processingdelay.
 
4 kind of time can be obtained by the log;

 Timestamp when SQL started
 Timestamp when information began to be sent to the backend
 Timestamp when information is successfully received in the application
 Timestamp when SQL ended

Then the difference between the time is checked to find out which part of process takes time.


I reconfirm the function I proposed.

If get the trace log, set PGLOGDIR/logdir and PGLOGSIZE/logsize.
These parameters are set in the environment variable or the connection service file.
- logdir or PGLOGDIR : directory where log file written
- logsize or PGLOGSIZE : maximum log size. When the log file size exceeds to PGLOGSIZE, the log is output to another
file.

The log file name is determined as follow.
libpq-%ConnectionID-%Y-%m-%d_%H%M%S.log

This is a trace log example;

...
Start: 2018-09-03 18:16:35.357 Connection(1)
Status: Connection
Send message: 2018-09-03 18:16:35.358
<information send to backend...>
Receive message: 2018-09-03 18:16:35.359
<information receive from backend...>
End: 2018-09-03 18:16:35.360
...
Start: 2018-09-03 18:16:35.357 Connection(1)            ...(1), (2)
Query: DECLARE myportal CURSOR FOR select * from pg_database    ...(3)
Send message: 2018-09-03 18:16:35.358                ...(4)
<information send to backend...>                    ...(5)
Receive message: 2018/09/03 18:16:35.359                ...(6)
<information receive from backend...>                ...(7)
End: 2018-09-03 18:16:35.360                    ...(8)
...


(1) Timestamp when SQL started
(2) Connection ID (Identify the connection)
(3) SQL statement
(4) Timestamp when information began to be sent to the backend
(5) send message to backend (Result of query, Protocol messages)
(6) Timestamp when information is successfully received in the application
(7) receive message from backend (Result of query, Protocol messages)
(8) Timestamp when SQL ended


Regards,
Iwata Aya



RE: libpq debug log

From
"Iwata, Aya"
Date:
Hi, 

Sorry for my late response.

> Between perf/systemtap/dtrace and wireshark, you can already do pretty much
> all of that.  Have you looked at those and found anything missing?

These commands provide detailed information to us.
However, I think the trace log is necessary from the following point.

1. ease of use for users
It is necessary to take information that is easy to understand for database users.
This trace log is retrieved on the application server side.
Not only the database developer but also application developer will get and check this log.
Also, some of these commands return detailed PostgreSQL function names.
The trace log would be useful for users who do not know the inside of PostgreSQL (e.g. application developers)


2. obtain the same information on all OS
Some commands depend on the OS.
I think that it is important that the trace log information is compatible to each OS.

Regards,
Aya Iwata



RE: libpq debug log

From
"Iwata, Aya"
Date:
Hi,

I create a first libpq trace log patch.

In this patch, 
- All message that PQtrace() gets are output to the libpq trace log file
  (I maybe select more effective message in the future patch)
- Trace log output style is changed slightly from previously proposed

This patch not include documentation,
but you can see parameter detail and how to use it by looking at my previous e-mail.
 
 If get the trace log, set PGLOGDIR/logdir and PGLOGSIZE/logsize.
 These parameters are set in the environment variable or the connection service
 file.
 - logdir or PGLOGDIR : directory where log file written
 - logsize or PGLOGSIZE : maximum log size(M). When the log file size exceeds to
 PGLOGSIZE, the log is output to another file.
 
 The log file name is determined as follow.
 libpq-%ProcessID-%Y-%m-%d_%H%M%S.log
 
Trace log example;
Start :  2018/10/30 08:02:24.433                    ... time(a)
Query: SELECT pg_catalog.set_config('search_path', '', false)
To backend> Msg Q
To backend> "SELECT pg_catalog.set_config('search_path', '', false)"
To backend> Msg complete, length 60
Start sending message to backend:  2018/10/30 08:02:24.433        ... time(b)
End sending message to backend:  2018/10/30 08:02:24.433        ... time(c)
Start receiving message from backend:  2018/10/30 08:02:24.434    ... time(d)
End receiving message from backend:  2018/10/30 08:02:24.434    ... time(e)
From backend> T
From backend (#4)> 35
From backend (#2)> 1
From backend> "set_config"
From backend (#4)> 0
From backend (#2)> 0
From backend (#4)> 25
From backend (#2)> 65535
From backend (#4)> -1
From backend (#2)> 0
From backend> D
From backend (#4)> 10
From backend (#2)> 1
From backend (#4)> 0
From backend> C
From backend (#4)> 13
From backend> "SELECT 1"
From backend> Z
From backend (#4)> 5
From backend> Z
From backend (#4)> 5
From backend> I
End :  2018/10/30 08:02:24.435                    ... time(f)

From time(a) to time(b): time for libpq processing
From time(b) to time(c): time for traffic
From time(c) to time(d): time for backend processing
From time(d) to time(e): time for traffic
From time(e) to time(f): time for libpq processing

Regards,
Aya Iwata

Attachment

Re: libpq debug log

From
Jim Doty
Date:
Greetings,

This is my first attempt at a patch review, so I will take a pass at the
low hanging fruit.

Initial Pass
============

+ Patch applies
+ Patch builds
+ Patch behaves as described in the thread

I tried a few small things:

When I set a relative path for `PGLOGDIR`, the files were correctly
written to the directory.

When I set a path for `PGLOGDIR` that didn't exist or was not
write-able, the patch writes no files, and does not alert the user that
no files are being written.

Performance
===========

I ran two permutations of make check, one with the patch applied but not
activated, and the other with with the files being written to disk. Each
permutation was run ten times, and the stats are below (times are in
seconds):

              min  max  median  mean
not logging  50.4 57.6    53.3  53.4
    logging  58.3 77.7    65.0  65.8


Cheers,
Jim Doty


Re: libpq debug log

From
Jacob Champion
Date:
On Tue, Oct 30, 2018 at 2:39 AM Iwata, Aya <iwata.aya@jp.fujitsu.com> wrote:
> I create a first libpq trace log patch.

Couple additional thoughts from a read-through of the patch:

- PQtrace() and the new trace-logging machinery overlap in some places
but not others -- and if both are set, PQtrace() will take precedence.
It seems like the two should not be separate.
- It isn't immediately clear to me how the new information in the logs
is intended to be used in concert with the old information. Maybe this
goes back to the comments by Tom and Robert higher in the thread --
that an overhaul of the PQtrace system is due. This patch as presented
would make things a little worse before they got better, I think.

That said, I think the overall idea -- application performance
information that can be enabled via the environment, without requiring
debugging privileges on a machine or the need to manually correlate
traces made by other applications -- is a good one, and something I
would use.

--Jacob


Re: libpq debug log

From
Haribabu Kommi
Date:

On Tue, Oct 30, 2018 at 8:38 PM Iwata, Aya <iwata.aya@jp.fujitsu.com> wrote:
Hi,

I create a first libpq trace log patch.

In this patch,
- All message that PQtrace() gets are output to the libpq trace log file
  (I maybe select more effective message in the future patch)
- Trace log output style is changed slightly from previously proposed

This patch not include documentation,
but you can see parameter detail and how to use it by looking at my previous e-mail.

 If get the trace log, set PGLOGDIR/logdir and PGLOGSIZE/logsize.
 These parameters are set in the environment variable or the connection service
 file.
 - logdir or PGLOGDIR : directory where log file written
 - logsize or PGLOGSIZE : maximum log size(M). When the log file size exceeds to
 PGLOGSIZE, the log is output to another file.

 The log file name is determined as follow.
 libpq-%ProcessID-%Y-%m-%d_%H%M%S.log

Trace log example;
Start :  2018/10/30 08:02:24.433                                        ... time(a)
Query: SELECT pg_catalog.set_config('search_path', '', false)
To backend> Msg Q
To backend> "SELECT pg_catalog.set_config('search_path', '', false)"
To backend> Msg complete, length 60
Start sending message to backend:  2018/10/30 08:02:24.433              ... time(b)
End sending message to backend:  2018/10/30 08:02:24.433                ... time(c)
Start receiving message from backend:  2018/10/30 08:02:24.434  ... time(d)
End receiving message from backend:  2018/10/30 08:02:24.434    ... time(e)
From backend> T
From backend (#4)> 35
From backend (#2)> 1
From backend> "set_config"
From backend (#4)> 0
From backend (#2)> 0
From backend (#4)> 25
From backend (#2)> 65535
From backend (#4)> -1
From backend (#2)> 0
From backend> D
From backend (#4)> 10
From backend (#2)> 1
From backend (#4)> 0
From backend> C
From backend (#4)> 13
From backend> "SELECT 1"
From backend> Z
From backend (#4)> 5
From backend> Z
From backend (#4)> 5
From backend> I
End :  2018/10/30 08:02:24.435                                  ... time(f)

From time(a) to time(b): time for libpq processing
From time(b) to time(c): time for traffic
From time(c) to time(d): time for backend processing
From time(d) to time(e): time for traffic
From time(e) to time(f): time for libpq processing

Thanks for the patch.

I have some comments related to the trace output that is getting
printed. The amount of log it is generating may not be understood
to many of the application developers. IMO, this should print
only the necessary information that can understood by any one
by default and full log with more configuration?

Regards,
Haribabu Kommi
Fujitsu Australia

RE: libpq debug log

From
"Iwata, Aya"
Date:
Hi Jim Doty san,

Thank you for review! I'm sorry my late reply...

> Initial Pass
> ============
> 
> + Patch applies
> + Patch builds
> + Patch behaves as described in the thread
Thank you for your check.

> When I set a path for `PGLOGDIR` that didn't exist or was not write-able,
> the patch writes no files, and does not alert the user that no files are being
> written.
I understand. I think it means that it is necessary to confirm how the setting is going well. 
There is no warning method when connection string or the environment variable is wrong.

So I added following document:
+   If the setting of the file path by the connection string or the environment variable is
+   incorrect, the log file is not created in the intended location.
+   The maximum log file size you set is output to the beginning of the file, so you can check it.
And I added the process. Please see my v2 patch.

> Performance
> ===========
> 
> I ran two permutations of make check, one with the patch applied but not
> activated, and the other with with the files being written to disk. Each
> permutation was run ten times, and the stats are below (times are in
> seconds):
> 
>               min  max  median  mean
> not logging  50.4 57.6    53.3  53.4
>     logging  58.3 77.7    65.0  65.8
Thank you for your measurement. 
I'm thinking about adding a logging level so that only the necessary information can be printed by default. It was
pointedout by Haribabu san's e-mail.
 
This minimizes the impact of logging on performance.

Regards,
Aya Iwata

Attachment

RE: libpq debug log

From
"Iwata, Aya"
Date:

Hi Hari san,

 

Thank you for your comment! And sorry my late reply…

 

>I have some comments related to the trace output that is getting

>printed. The amount of log it is generating may not be understood

>to many of the application developers. IMO, this should print

>only the necessary information that can understood by any one

>by default and full log with more configuration?

 

I understand. And I agree your opinion.

I will add feature called “log level” that changes the amount of log output information with my future version patch.

 

Regards,

Aya Iwata

RE: libpq debug log

From
"Iwata, Aya"
Date:
Hi Jacob san, 

Thank you for your comment! And sorry for late reply...

> Couple additional thoughts from a read-through of the patch:
> 
> - PQtrace() and the new trace-logging machinery overlap in some places but
> not others -- and if both are set, PQtrace() will take precedence.
> It seems like the two should not be separate.
I understand. This log is acquired for the purpose of investigating the cause part (server side or application side)
whenperformance is bad.
 
So I will search whether getting other place of PQtrace() is necessary or not. 
And I will reply after the research, please wait for a while a little.

> - It isn't immediately clear to me how the new information in the logs is
> intended to be used in concert with the old information. Maybe this goes back
> to the comments by Tom and Robert higher in the thread -- that an overhaul
> of the PQtrace system is due. This patch as presented would make things a
> little worse before they got better, I think.
> 
> That said, I think the overall idea -- application performance information
> that can be enabled via the environment, without requiring debugging
> privileges on a machine or the need to manually correlate traces made by other
> applications -- is a good one, and something I would use. 
Thank you. I think so, too. Some applications cannot be stopped to add the PQtrace() code.

Regards,
Aya Iwata

RE: libpq debug log

From
"Iwata, Aya"
Date:
Hi,

I created a new version patch. Please find attached my patch.

Changes:
Since v2 patch
I forgot to write the detail of v2 patch changes. So I write these.
- Fixed the " Start receiving message from backend:" message location. Because I found that message located at outside
of"retry4". 
 
- Changed location where output "start :" / "end :" messages and timestamp. The reason of the change is that v1 patch
didnot correspond to Asynchronous Command Processing.
 
- Added documentation
- Added documentation how to check mistake of logdir and/or logsize. (Based on review comment of Jim san's)
Since v3 patch
- Fixed my mistake on fe-connect.c. Time and message were output at the place where does not output in originally
PQtrace().These messages are needed only new trace log. So I fixed it.
 
- Fixed two points so that new trace log overlaps all places in PQtrace(). (Based on review comment of Jacob san's)

TODO:
I will add the feature called "logging level" on next version patch. I will attach it as soon as possible.

I'm marking it as "Needs review".

Regards,
Aya Iwata

Attachment

Re: libpq debug log

From
Peter Eisentraut
Date:
On 27/11/2018 08:42, Iwata, Aya wrote:
> I created a new version patch. Please find attached my patch.

This does not excite me.  It seems mostly redundant with using tcpdump.

If I were to debug networking problems, I wouldn't even trust this very
much because it relies on the willpower of all future PostgreSQL
developers to keep this accurately up to date, whereas tcpdump gives me
the truth from the kernel.

-- 
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


RE: libpq debug log

From
"Iwata, Aya"
Date:
Hi Peter,

Thank you for your reply!

> On 27/11/2018 08:42, Iwata, Aya wrote:
> > I created a new version patch. Please find attached my patch.
> 
> This does not excite me.  It seems mostly redundant with using tcpdump.
I will develop "log level". I'm planning not to output redundant message at the default level.

> If I were to debug networking problems, I wouldn't even trust this very much
> because it relies on the willpower of all future PostgreSQL developers to
> keep this accurately up to date, whereas tcpdump gives me the truth from the
> kernel.
I agree your concern about log trusty. It would be a good choice for only
skilled users to use tcpdump. I think libpq trace log will be used many users,
it includes users who not familiar with PostgreSQL protocols. The log would be easier to use 
because it shows "start time" and "end time". On tcpdump also shows 
the starting time and ending time but people need to know PostgreSQL protocol 
to get them.

And this log also is useful for Windows users.
Windows does not have originally networking trace tool. 

If you have any ideas about maintain this feature, I would like to know it.


Regards,
Aya Iwata

RE: libpq debug log

From
"Iwata, Aya"
Date:
Hi,

> TODO:
> I will add the feature called "logging level" on next version patch. I will
> attach it as soon as possible.
I created v4 patch. Please find attached the patch.
This patch developed "logminlevel" parameter. level1 and level2 can be set, level1 is the default. 
If level1 is set, it outputs the time in the functions. It helps to find out where it takes time.
If level2 is set, it outputs information on the protocol being exchanged in addition to level1 information.

I would appreciate if you could review my latest patch.

Regards,
Aya Iwata

Attachment

RE: libpq debug log

From
"Iwata, Aya"
Date:
Hi,

I created v5 patch. Please find attached the patch.

This patch updated following items; 
- Changed "else if" to "if" in traceLog_fprintf(). This means that both PQtrace() and new trace log are set, you can
getboth log result.
 
- Implemented loglevel with enum. This change makes it easier if you want to add new log levels.
- Checked http://commitfest.cputube.org/, I modified the code slightly.

I would appreciate if you could review my latest patch.


Regards,
Aya Iwata

Attachment

RE: libpq debug log

From
"Iwata, Aya"
Date:
Hi,

I have developed a new libpq trace logging aimed at checking which side (server or client) is causing the performance
issue.

The new libpq trace log can do the following things;
- Setting whether to get log or not by using connection strings or environment variables. It means that application
sourcecode changes is not needed to get the log.
 
- Getting time when receive and send process start/end. Functions too.
- Setting log level; When level1(default) is set, it outputs the time in the function and connection time. When level2
isset, it outputs information on the protocol message being exchanged, in addition to level1 information.
 

I updated patch, but I am not sure if these changes and implementation are correct or not. So I need your comment and
advice.
I would appreciate your advice and develop/fix my patch further.

Regards,
Aya Iwata

RE: libpq debug log

From
"Nagaura, Ryohei"
Date:
Hi Iwata-san,

I used your patch for my private work, so I write my opinion and four feedback below.
On Fri, Jan 18, 2019 at 8:19 AM, Iwata, Aya wrote:
> - Setting whether to get log or not by using connection strings or environment
> variables. It means that application source code changes is not needed to get
> the log.
> - Getting time when receive and send process start/end. Functions too.
This merit was very helpful for my use, so I want your proposal function in postgres.

The followings are feedback from me.

1)
It would be better making the log format the same as the server log format, I think.
Your log format:
    2019/01/22 04:15:25.496 ...
Server log format:
    2019-01-22 04:15:25.496 UTC ...
There are two differences:
    One is separator character of date, "/" and "-".
    The another is standard time information.

2)
It was difficult for me to understand the first line message in the log file.
"Max log size is 10B, log min level is LEVEL1"
Does this mean as follows?
"The maximum size of this file is 10 Bytes, the parameter 'log min level' is set to LEVEL 1."

3)
Under the circumstance that the environment variables "PGLOGDIR" and "PGLOGSIZE" are set correctly,
the log file will also be created when the user connect the server with "psql".
Does this follow the specification you have thought?
Is there any option to unset only in that session when you want to connect with "psql"?

4)
Your patch affects the behavior of PQtrace().
The log of the existing PQtrace() is as follows:
From backend> "id"
From backend (#4)> 16387
From backend (#2)> 1
From backend (#4)> 23
...
Your patch makes PQtrace() including the following log in addition to the above.
To backend> Msg complete, length 27
Start sending message to backend:End sending message to backend:PQsendQuery end :PQgetResult start :Start receiving
messagefrom backend:End receiving message from backend:From backend> T
 
...


For your information.
Best regards,
---------------------
Ryohei Nagaura



RE: libpq debug log

From
"Iwata, Aya"
Date:
Hi Nagaura-san,

Thank you for your review and advice.

> This merit was very helpful for my use, so I want your proposal function in
> postgres.
Thank you. 

> 1)
> It would be better making the log format the same as the server log format,
> I think.
> Your log format:
>     2019/01/22 04:15:25.496 ...
> Server log format:
>     2019-01-22 04:15:25.496 UTC ...
> There are two differences:
>     One is separator character of date, "/" and "-".
>     The another is standard time information.
Sure. I will change separator character to "-" and add timezone information.

> 2)
> It was difficult for me to understand the first line message in the log file.
> "Max log size is 10B, log min level is LEVEL1"
> Does this mean as follows?
> "The maximum size of this file is 10 Bytes, the parameter 'log min level'
> is set to LEVEL 1."
Yes. The purpose of the line message is to check the value of the set parameter.
I will change it to you suggest.

> 3)
> Under the circumstance that the environment variables "PGLOGDIR" and
> "PGLOGSIZE" are set correctly, the log file will also be created when the
> user connect the server with "psql".
> Does this follow the specification you have thought?
> Is there any option to unset only in that session when you want to connect
> with "psql"?
There are no option to not output log when connected by "psql". 
It is not good to create lots of empty files. I think that the cause of this issue is that the initialization location
ofthe new trace log is not good.
 
I will fix it so that logs are not output when connected to "psql".

> 4)
> Your patch affects the behavior of PQtrace().
> The log of the existing PQtrace() is as follows:
> From backend> "id"
> From backend (#4)> 16387
> From backend (#2)> 1
> From backend (#4)> 23
> ...
> Your patch makes PQtrace() including the following log in addition to the
> above.
> To backend> Msg complete, length 27
> Start sending message to backend:End sending message to backend:PQsendQuery
> end :PQgetResult start :Start receiving message from backend:End receiving
> message from backend:From backend> T ...
Thank you for finding it. I will fix not to affect PQtrace().

Regards,
Aya Iwata

Re: libpq debug log

From
Andres Freund
Date:
Hi,

On 2018-11-28 23:20:03 +0100, Peter Eisentraut wrote:
> This does not excite me.  It seems mostly redundant with using tcpdump.

I think the one counter-argument to this is that using tcpdump in
real-world scenarios has become quite hard, due to encryption. Even with
access to the private key you cannot decrypt the stream.  Wonder if the
solution to that would be an option to write out the decrypted data into
a .pcap or such.

Greetings,

Andres Freund


Re: libpq debug log

From
Jacob Champion
Date:
On Thu, Feb 14, 2019 at 10:17 AM Andres Freund <andres@anarazel.de> wrote:
> On 2018-11-28 23:20:03 +0100, Peter Eisentraut wrote:
> > This does not excite me.  It seems mostly redundant with using tcpdump.
>
> I think the one counter-argument to this is that using tcpdump in
> real-world scenarios has become quite hard, due to encryption.

+1. Another difficulty is having the OS permissions to do the raw
packet dumps in the first place.

--Jacob


RE: libpq debug log

From
"Jamison, Kirk"
Date:
On February 14, 2019 6:16 PM +0000, Andres Freund wrote:

> Hi,

> On 2018-11-28 23:20:03 +0100, Peter Eisentraut wrote:
> > This does not excite me.  It seems mostly redundant with using tcpdump.

> I think the one counter-argument to this is that using tcpdump in real-world
> scenarios has become quite hard, due to encryption. Even with access to the
> private key you cannot decrypt the stream.  Wonder if the solution to that
> would be an option to write out the decrypted data into a .pcap or such.

I agree that network debug trace logs would be useful for users not knowledgeable
of postgres internals, so I understand the value of the feature,
as long as only necessary/digestible information is outputted.
I'll also check the patch later.

For Andres, I haven't looked into tcpdump yet, but I'd like to ask whether
or not the decrypted output to .pcap (if implemented) may be useful to
application users. What could be the limitations?
Could you explain a bit further on the idea?

Regards,
Kirk Jamison



Re: libpq debug log

From
Andres Freund
Date:
Hi,

On 2019-02-18 02:23:12 +0000, Jamison, Kirk wrote:
> For Andres, I haven't looked into tcpdump yet, but I'd like to ask whether
> or not the decrypted output to .pcap (if implemented) may be useful to
> application users. What could be the limitations?
> Could you explain a bit further on the idea?

Well, wireshark (and also tcpdump in a less comfortable manner) has a
dissector for the postgresql protocol. That allows to dig into various
parts. See e.g. the attached as an example of what you can see as the
response to a SELECT 1;

Right now that's not usable if the connection is via TLS, as pretty much
all encrypted connection use some form of forward secrecy, so even if
you had access the the private key, we'd not be able to parse it into an
unencrypted manner.

Greetings,

Andres Freund

Attachment

RE: libpq debug log

From
"Iwata, Aya"
Date:
Hi, 

Because I mistook something about how to reply to e-mails, 
my last reply is not reflected in the thread.

Response to Nagaura san's review point, I fixed all his review notes, except for pointing out about psql.
Please see the attached updated patch.

> 1)
> It would be better making the log format the same as the server log format,
I changed date style and added timezone.

> 2)
> It was difficult for me to understand the first line message in the log file.
I changed the message as "The maximum size of this log is 3 Bytes, the parameter 'logminlevel' is set to level2  ".

> 3)
> Under the circumstance that the environment variables "PGLOGDIR" and
> "PGLOGSIZE" are set correctly, the log file will also be created when the
> user connect the server with "psql".
> Does this follow the specification you have thought?
> Is there any option to unset only in that session when you want to connect
> with "psql"?
By separating session using Tera Term or screen command, you can do what you want.
So I didn't make the code complicated by implementing the option.

> 4)
> Your patch affects the behavior of PQtrace().
Thank you. I fixed.

Regards,
Aya Iwata

Attachment

Re: libpq debug log

From
Ramanarayana
Date:
Hi,

I have tested the trace log implementation.Please find my feedback for the same.

Issues found while testing
---------------------------------
1) If invalid value is given to PGLOGMINLEVEL, empty log file is created which should not happen.
2) If log file size exceeds the value configured in PGLOGSIZE, new log file is not getting created.
3) If PGLOGSIZE is greater than 2048 bytes, log file is not created.Is this expected behaviour?
4) In the log file, an extra new line is present whenever the query is printed.Is this intentional?
5)Documentation for this feature is having grammatical errors and some spelling errors which can be looked into.

Feedback in the code
----------------------------------
1) if else statement should be used for checking log levels rather than multiple if statements
2) Across the code, sufficient space need to be left between parameters in functions and while using comparison operators  
3) In libpq-fe.h in the comments section it should trace rather than trce

Suggestions
----------------------
-> Will it better if the logs of all processes are written to a single log file with the log message containing the process id rather than one log file per process?


RE: libpq debug log

From
"Jamison, Kirk"
Date:
Hi Iwata-san,

Currently, the patch fails to build according to CF app.
As you know, it has something to do with the misspelling of function.
GetTimezoneInformation --> GetTimeZoneInformation

It sounds more logical to me if there is a parameter that switches on/off the logging
similar to other postgres logs. I suggest trace_log as the parameter name.
Like, this parameter needs to be enabled for logdir, logsize and loglevel, etc. to work.
The default is off.

If you're going to introduce that parameter, then the docs should be updated as well.
i.e. "When <literal>trace_log</literal> is enabled, this parameter..."

How about changing the following parameter names:
logdir --> log_directory
logsize --> log_size
logminlevel --> log_min_level

If it's helpful, you might want to look into how the other postgres logs (i.e. syslogger)
allow setting either absolute or relative path for log directory, and how the parameters
cover some of the comments above by Ramanarayana.

Regards,
Kirk Jamison

Re: libpq debug log

From
Robert Haas
Date:
On Mon, Feb 18, 2019 at 10:06 PM Jamison, Kirk <k.jamison@jp.fujitsu.com> wrote:
> It sounds more logical to me if there is a parameter that switches on/off the logging
> similar to other postgres logs. I suggest trace_log as the parameter name.

I'm not really sure that I like the design of this patch in any way.
But leaving that aside, trace_log seems like a poor choice of name,
because it's got two words telling you what kind of thing we are doing
(tracing, logging) and zero words describing the nature of the thing
being traced or logged (the wire protocol communication with the
client).  A tracing facility for the planner, or the executor, or any
other part of the system would have an equally good claim on the same
name.  That can't be right.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


RE: libpq debug log

From
"Iwata, Aya"
Date:
Hi Ramanarayana,

Thank you for your review and suggestion. Please see the attached updated patch.

> Issues found while testing
>1) If invalid value is given to PGLOGMINLEVEL, empty log file is created which should not happen.
Thank you for your test. However in my dev environment, empty log file is not created.
Could you explain more detail about 1)? I will check it again.

>2) If log file size exceeds the value configured in PGLOGSIZE, new log file is not getting created.
About 2) (and may be 1) ), perhaps is this something like that?

There are my mistake about first line information of created log file 
"The maximum size of this log is %s *Bytes*, the parameter 'logminlevel' is set to %s\n".
- Maximum size is not bytes but megabytes.
- Display logminlevel which set by user. Internally, an invalid value is not set to logminlevel.

So trust the created log file first line info, if you set PGLOGSIZE=1000 as the meaning of "set maximum log size to
1000Bytes",
 
a new file was not created even if it exceeds 1000 bytes.
If it is correct, I fixed the comment to output internal setting log maximum size and user setting value.

And if you set PGLOGMINLEVEL to invalid value (ex. "aaa"), it is not set to the parameter; The default value (level1)
isset internally. 
 
I fixed first line comment to output notification " if invalid value, level1(default) is set".

>3) If PGLOGSIZE is greater than 2048 bytes, log file is not created. Is this expected behavior?
Yes. I limit log file size.

>4) In the log file, an extra new line is present whenever the query is printed. Is this intentional?
Thank you, I fixed.

>5)Documentation for this feature is having grammatical errors and some spelling errors which can be looked into.
Thank you. I am checking my documentation now. I will fix it.

> Feedback in the code
Thank you. I fixed my code issue.

> Suggestions
I'll consider that...
Could you explain more about the idea?

Regards,
Aya Iwata

Attachment

RE: libpq debug log

From
"Iwata, Aya"
Date:
Hi Kirk,

> Currently, the patch fails to build according to CF app.
> As you know, it has something to do with the misspelling of function.
> GetTimezoneInformation --> GetTimeZoneInformation
Thank you. I fixed it. Please see my v7 patch.

Regards,
Aya Iwata

RE: libpq debug log

From
"Iwata, Aya"
Date:
Hi Robert, 

> I'm not really sure that I like the design of this patch in any way.
Aside from problems with my current documentation which I will fix,
could you explain more detail about the problem of the design? 
I would like to improve my current implementation based from feedback.

Regards,
Aya Iwata

RE: libpq debug log

From
"Jamison, Kirk"
Date:
On Wednesday, February 20, 2019 12:56 PM GMT+9, Robert Haas wrote:

> On Mon, Feb 18, 2019 at 10:06 PM Jamison, Kirk <k.jamison@jp.fujitsu.com> wrote:
> > It sounds more logical to me if there is a parameter that switches 
> > on/off the logging similar to other postgres logs. I suggest trace_log as the parameter name.

> I'm not really sure that I like the design of this patch in any way.
> But leaving that aside, trace_log seems like a poor choice of name, 
> because it's got two words telling you what kind of thing we are doing 
> (tracing, logging) and zero words describing the nature of the thing 
> being traced or logged (the wire protocol communication with the client).
> A tracing facility for the planner, or the executor, or any other part
> of the system would have an equally good claim on the same name.  That
> can't be right.

Agreed about the argument with trace_log parameter name. I just shootout
a quick idea. I didn't think about it too deeply, but just thought of a
switch that will enable or disable the feature. So there are 
definitely better names other than that. And as you suggested, should 
describe specifically what the feature does. 

Regards,
Kirk Jamison

Re: libpq debug log

From
Robert Haas
Date:
On Thu, Feb 21, 2019 at 7:52 PM Iwata, Aya <iwata.aya@jp.fujitsu.com> wrote:
> > I'm not really sure that I like the design of this patch in any way.
> Aside from problems with my current documentation which I will fix,
> could you explain more detail about the problem of the design?
> I would like to improve my current implementation based from feedback.

Well, I believe that what you've got here is something that could,
perhaps, be occasionally useful.  However, I don't think it would be
useful to very many people very often, and we'd still have to maintain
the code, so that's not a great situation.

We already have a PQtrace() facility that could be improved, and it
has been previously suggested that you work on improving this facility
rather than inventing something new.  I still think that's a good
idea.  Instead you've created a second way of producing similar
information, and then coupled it to very specific ideas about how that
information should be logged: it is triggered by new libpq parameters,
there is log rotation logic, etc.  Those might not be right for
everyone, and there's no flexibility in the mechanism.

I am not sure that it's a good idea to have facilities that write to
the local filesystem that can be triggered by libpq parameters.  Seems
like that might have possible security consequences, or at least annoy
people who want to accept connection strings from users without having
to sanitize them for these sorts of options.

I do sometimes want to know  what's going on at the protocol level.
Sometimes it's possible to use wireshark for that (as mentioned
upthread) and when it isn't, the thing I'd really like is for the
command-line clients that already exist have an option to enable
PQtrace without me having to hack the C code.  We could go through and
add a long-form command-line option, --libpq-trace, to every
command-line tool we ship, for example.  Then we could, as a separate
patch, improve the format of that tracing output.  For me that would
be more useful than this.

I don't necessarily think there's anything deeply wrong with this
approach.  It's not like your patch will bring about the end of
civilization or anything like that.  It just doesn't excite me very
much.  And since I agree that we have a problem in this area, I would
ideally like to have a solution to that problem that I feel excited
about.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: libpq debug log

From
Tom Lane
Date:
Robert Haas <robertmhaas@gmail.com> writes:
> On Thu, Feb 21, 2019 at 7:52 PM Iwata, Aya <iwata.aya@jp.fujitsu.com> wrote:
>> Aside from problems with my current documentation which I will fix,
>> could you explain more detail about the problem of the design?

> We already have a PQtrace() facility that could be improved, and it
> has been previously suggested that you work on improving this facility
> rather than inventing something new.  I still think that's a good
> idea.

Me too.  PQtrace as currently constituted might be helpful for debugging
libpq itself, but it's nigh useless for any higher-level purpose.  I'd
gladly toss overboard any hypothetical use-case for what it does now,
in favor of having something that dumps stuff at the logical level of
messages.  (It's not even very consistent, because somebody did add a
more-or-less-message-level printout to pqSaveParameterStatus, which
I believe is duplicative of what fe-misc.c will print about the
same interaction at the byte level.)

> Instead you've created a second way of producing similar
> information, and then coupled it to very specific ideas about how that
> information should be logged: it is triggered by new libpq parameters,
> there is log rotation logic, etc.  Those might not be right for
> everyone, and there's no flexibility in the mechanism.

Good point.  To the extent that people want any of that, they'd probably
want to have application control over it.  Maybe, in addition to
PQtrace(FILE *) that'd just dump to a stdio stream, there should be a way
to create a callback similar to a notice-message hook, and let the
application implement such features in a custom callback.

> I am not sure that it's a good idea to have facilities that write to
> the local filesystem that can be triggered by libpq parameters.

Oy.  That seems like a *very* serious objection.  I agree with Robert's
thought that it'd be better to insist on application involvement in
enabling trace output.

            regards, tom lane


RE: libpq debug log

From
"Iwata, Aya"
Date:
Hi,

Thank you for your comments and advice.

I'd like to consider your suggestions.
I am planning to change libpq logging like this;

1. Expand PQtrace() facility and improve libpq logging.

2. Prepare "output level". There are 3 type of levels;
    - TRADITIONAL    :  if set, outputs protocol messages
    - LEVEL1        :  if set, outputs phase and time
    - LEVEL2        :  if set, outputs both info TRADITIONAL and LEVEL1

3. Add "output phase" information; This is the processing flow. (ex. When PQexec() start and end )

4. Change output method to callback style; Default is stdout, and prepare other callback functions that will be used
morefrequently.
 

5. Initialization method; 
In current one: PQtrace(PGconn *conn, FILE *stream);
Proposed change: PQtraceEx(PGconn *conn, FILE *stream, PQloggingProcessor callback_func , void *callback_arg,
PGLogminlevellevel);
 
PQtrace() can be use as it is to consider compatibility with previous applications, 
so I leave PQtrace() and created a new function PQtraceEx(). 

After discussing the abovementioned, then maybe we can discuss more about enabling trace output and changing the output
style.

What do you think? I would appreciate your comments and suggestions.
 
Regards,
Aya Iwata



Re: libpq debug log

From
Kyotaro HORIGUCHI
Date:
Hello.

I came up with some random comments.

At Mon, 4 Mar 2019 08:13:00 +0000, "Iwata, Aya" <iwata.aya@jp.fujitsu.com> wrote in
<71E660EB361DF14299875B198D4CE5423DEF1844@g01jpexmbkw25>
> Hi,
> 
> Thank you for your comments and advice.
> 
> I'd like to consider your suggestions.
> I am planning to change libpq logging like this;
> 
> 1. Expand PQtrace() facility and improve libpq logging.
> 
> 2. Prepare "output level". There are 3 type of levels;
>     - TRADITIONAL    :  if set, outputs protocol messages
>     - LEVEL1        :  if set, outputs phase and time
>     - LEVEL2        :  if set, outputs both info TRADITIONAL and LEVEL1

You appear to want to segregate the "traditional" output from
what you want to emit. At least Tom is explicitly suggesting to
throw away the hypothtical use cases for it. You may sort out
what kind of information you/we want to emit as log messages from
scratch:p

You may organize log kinds into hierachical levels like server
log messages or into orthogonal types that are individually
turned on. But it is not necessarily be a parameter of a logging
processor. (mentioned below)

> 3. Add "output phase" information; This is the processing flow. (ex. When PQexec() start and end )

What is the advantage of it against just two independent messages
like PQexec_start and PQexec_end? (I don't see any advantage.)

> 4. Change output method to callback style; Default is stdout, and prepare other callback functions that will be used
morefrequently.
 

Are you going to make something less-used the default behavior? I
think no one is opposing rich functionality as far as it is
replaceable.

> 5. Initialization method;
> In current one: PQtrace(PGconn *conn, FILE *stream);
> Proposed change: PQtraceEx(PGconn *conn, FILE *stream, PQloggingProcessor callback_func , void *callback_arg,
PGLogminlevellevel);
 

I'm not sure we should add a new *EX() function. Couldn't we
just change the signature of PQtrace()?

callback_funs seems to be a single function. I think it's better
to have individual function for each message type.  Not
callback_func(PQLOG_EXEC_START, param_1, param_2,...)  ,but
PQloggingProcessor.PQexec_start(param_1, param_2, ...).

It is because the caller can simply pass values in its own type
to the function without casting or other transformations and
their types are checked statically.

I also think it's better that logger-specific paramters are not
passed in this level.  Maybe stream and level are logger-specific
paratmer, which can be combined into callback_arg, or can be
given via an API function.

> PQtrace() can be use as it is to consider compatibility with previous applications, 
> so I leave PQtrace() and created a new function PQtraceEx(). 
> 
> After discussing the abovementioned, then maybe we can discuss more about enabling trace output and changing the
outputstyle.
 

I'm not sure what you mean by "output style" but you can change
everything by replacing the whole callback processor, which may
be a dynamic loaded file which is loaded by the instruction in
both ~/.libpqrc and some API, like PQloadLoggingProcessor()?


> What do you think? I would appreciate your comments and suggestions.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



Re: libpq debug log

From
Robert Haas
Date:
On Mon, Mar 4, 2019 at 3:13 AM Iwata, Aya <iwata.aya@jp.fujitsu.com> wrote:
> 2. Prepare "output level". There are 3 type of levels;
>         - TRADITIONAL   :  if set, outputs protocol messages
>         - LEVEL1                :  if set, outputs phase and time
>         - LEVEL2                :  if set, outputs both info TRADITIONAL and LEVEL1

I am not impressed by this proposal.  I think what we should be
focusing on here is how to clearly display the contents of a message.
I think we should be looking for a way to display each message on a
single line in a way that indicates the data types of the constituent
fields.  For example, here's a DataRow message as output by PQtrace
today:

From backend> D
From backend (#4)> 42
From backend (#2)> 4
From backend (#4)> 6
From backend (6)> public
From backend (#4)> 4
From backend (4)> tab1
From backend (#4)> 5
From backend (5)> table
From backend (#4)> 5
From backend (5)> rhaas

What I'd like to see for a case like this is something like:

<<< 'D' 42 #4 6 'public' 4 'tab1' 5 'table' 5 'rhaas'

And here's a RowDescription message today:

From backend> T
From backend (#4)> 101
From backend (#2)> 4
From backend> "Schema"
From backend (#4)> 2615
From backend (#2)> 2
From backend (#4)> 19
From backend (#2)> 64
From backend (#4)> -1
From backend (#2)> 0
From backend> "Name"
From backend (#4)> 1259
From backend (#2)> 2
From backend (#4)> 19
From backend (#2)> 64
From backend (#4)> -1
From backend (#2)> 0
From backend> "Type"
From backend (#4)> 0
From backend (#2)> 0
From backend (#4)> 25
From backend (#2)> 65535
From backend (#4)> -1
From backend (#2)> 0
From backend> "Owner"
From backend (#4)> 0
From backend (#2)> 0
From backend (#4)> 19
From backend (#2)> 64
From backend (#4)> -1
From backend (#2)> 0

And I propose that it should look like this:

<<< 'T' 101 4 "Schema" 2615 #2 19 #64 -1 #0 "Name" 1259 #2 19 #64 -1
#0 "Owner" 0 #0 19 #64 -1 #0

The basic idea being:

- Each line is a whole message.
- The line begins with <<< for a message received and >>> for a message sent.
- Strings in single quotes are those sent/received as a fixed number of bytes.
- Strings in double quotes are those sent/received as a string.
- 4-byte integers are printed unadorned.
- 2-byte integers are prefixed by #.
- I guess 1-byte integers would need some other prefix, maybe @ or ##.

Now if we want to timestamp those lines too, that'd be fine:

2019-03-04 21:33:39.338 EST <<< 'T' 101 4 "Schema" 2615 #2 19 #64 -1
#0 "Name" 1259 #2 19 #64 -1 #0 "Owner" 0 #0 19 #64 -1 #0
2019-03-04 21:33:39.342 EST <<< 'D' 42 #4 6 'public' 4 'tab1' 5
'table' 5 'rhaas'

But I still don't really see a need for different levels or whatever.
I mean, you either want a dump of all of the protocol traffic, or you
don't, I think.  Or maybe I am confused as to what the goal of all
this really is.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: libpq debug log

From
Tom Lane
Date:
Robert Haas <robertmhaas@gmail.com> writes:
> The basic idea being:

> - Each line is a whole message.
> - The line begins with <<< for a message received and >>> for a message sent.

+1, though do we really need to repeat the direction marker thrice?

> - Strings in single quotes are those sent/received as a fixed number of bytes.
> - Strings in double quotes are those sent/received as a string.
> - 4-byte integers are printed unadorned.
> - 2-byte integers are prefixed by #.
> - I guess 1-byte integers would need some other prefix, maybe @ or ##.

I doubt that anybody gives a fig for those distinctions, except when
they're writing actual code that speaks the protocol --- and I do not
think that that's the target use-case.  So strings and integers seem
like plenty.  I'd also suggest that just because the protocol has
single-letter codes for message types doesn't mean that average users
have memorized those codes; and that framing data like the message
length is of no interest.

In short, rather than

    <<< 'T' 101 4 "Schema" 2615 #2 19 #64 -1 #0 "Name" 1259 #2 19 #64 -1 #0 "Owner" 0 #0 19 #64 -1 #0

I'd envision something more like

    < RowDescription "Schema" 2615 2 19 64 -1 0 "Name" 1259 2 19 64 -1 0 "Owner" 0 0 19 64 -1 0

> But I still don't really see a need for different levels or whatever.
> I mean, you either want a dump of all of the protocol traffic, or you
> don't, I think.  Or maybe I am confused as to what the goal of all
> this really is.

Yeah, me too.  But a lot of this detail would only be useful if you
were trying to diagnose something like a discrepancy between the server
and libpq as to the width of some field.  And the number of users for
that can be counted without running out of fingers.  I think what would
be of use for a trace facility is as high-level a view as possible of
the message contents.

Or, in other words: a large part of the problem with the existing PQtrace
facility is that it *was* designed to help debug libpq itself, and that
use-case is no longer very interesting.  We should assume that the library
knows how to parse protocol messages.

            regards, tom lane


Re: libpq debug log

From
Robert Haas
Date:
On Mon, Mar 4, 2019 at 10:25 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
> > The basic idea being:
> > - Each line is a whole message.
> > - The line begins with <<< for a message received and >>> for a message sent.
>
> +1, though do we really need to repeat the direction marker thrice?

Perhaps not.

> > - Strings in single quotes are those sent/received as a fixed number of bytes.
> > - Strings in double quotes are those sent/received as a string.
> > - 4-byte integers are printed unadorned.
> > - 2-byte integers are prefixed by #.
> > - I guess 1-byte integers would need some other prefix, maybe @ or ##.
>
> I doubt that anybody gives a fig for those distinctions, except when
> they're writing actual code that speaks the protocol --- and I do not
> think that that's the target use-case.  So strings and integers seem
> like plenty.  I'd also suggest that just because the protocol has
> single-letter codes for message types doesn't mean that average users
> have memorized those codes; and that framing data like the message
> length is of no interest.

I don't agree with that.  For one thing, I'm someone, and I give a fig.

I would put it this way: with a very small amount of additional
notation it's possible to preserve the level of detail that we have
currently, and I think that's worth it.  Your proposed format for the
sample message I showed is very slightly shorter, which will almost
certainly not matter to anyone, but it leaves some slight ambiguity
about what was happening at the protocol level, which might.  If you
don't care, the additional detail in my proposed format is easy enough
to ignore.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


RE: libpq debug log

From
"Iwata, Aya"
Date:
Hi Horiguchi-san,

Thank you for your reply and suggestions.

> > 1. Expand PQtrace() facility and improve libpq logging.
> >
> > 2. Prepare "output level". There are 3 type of levels;
> >     - TRADITIONAL    :  if set, outputs protocol messages
> >     - LEVEL1        :  if set, outputs phase and time
> >     - LEVEL2        :  if set, outputs both info TRADITIONAL and LEVEL1
> 
> You appear to want to segregate the "traditional" output from what you want
> to emit. At least Tom is explicitly suggesting to throw away the hypothtical
> use cases for it. You may sort out what kind of information you/we want to
> emit as log messages from scratch:p
> 
> You may organize log kinds into hierachical levels like server log messages
> or into orthogonal types that are individually turned on. But it is not
> necessarily be a parameter of a logging processor. (mentioned below)

It is intended for old application users who use PQtrace() expect the existing/default/traditional log output style.
That’s why I separated other information(ex. phase and timestamp).
But since you mentioned the level is not necessary, 
I will follow your advice and include those information in the reformatted PQtrace().


> > 3. Add "output phase" information; This is the processing flow. (ex.
> > When PQexec() start and end )
> 
> What is the advantage of it against just two independent messages like
> PQexec_start and PQexec_end? (I don't see any advantage.)

I think the purpose of this logging improvement is to make it useful for analysis at performance deterioration.
When query delay happens, we want to know from which side(server or client) is the cause of it, 
and then people want to know which process takes time.
I think the phase and time information are useful for diagnosis.
For example, when command processing function (ex. PQexec()) etc. start/end 
and when client receive/send protocol messages.
/*my intended output here */


> > 4. Change output method to callback style; Default is stdout, and prepare
> other callback functions that will be used more frequently.
> 
> Are you going to make something less-used the default behavior? I think no
> one is opposing rich functionality as far as it is replaceable.
I am sorry, my explanation was not clear.
I just wanted to say I intend to provide several output method functions which users likely need.
ex. output to stdout, or output to file, or output to log directory.

 
> > 5. Initialization method;
> > In current one: PQtrace(PGconn *conn, FILE *stream); Proposed change:
> > PQtraceEx(PGconn *conn, FILE *stream, PQloggingProcessor callback_func
> > , void *callback_arg, PGLogminlevel level);
> 
> I'm not sure we should add a new *EX() function. Couldn't we just change the
> signature of PQtrace()?
I intended to add new *EX() function for compatibility purposes specially for old version of libpq. 
I would like to avoid making changes to old applications for this.
But if we insist on changing the PQtrace() itself, then I will follow your advice.

> 
> callback_funs seems to be a single function. I think it's better to have
> individual function for each message type.  Not
> callback_func(PQLOG_EXEC_START, param_1, param_2,...)  ,but
> PQloggingProcessor.PQexec_start(param_1, param_2, ...).
> 
> It is because the caller can simply pass values in its own type to the function
> without casting or other transformations and their types are checked
> statically.
> 
> I also think it's better that logger-specific paramters are not passed in
> this level.  Maybe stream and level are logger-specific paratmer, which can
> be combined into callback_arg, or can be given via an API function.

Thank you for your advice. I will consider it.

Regards,
Aya Iwata



RE: libpq debug log

From
"Iwata, Aya"
Date:
Hi everyone,

I appreciate all the helpful advice.

I agree to display message more clearly. I will follow your advice.
    
I would like to add timestamp per line and when command processing function start/end.
I think it is useful to know the application process start/end for diagnosis.
So I will implement like this;

2019-03-03 07:24:54.142 UTC   PQgetResult start
2019-03-03 07:24:54.143 UTC   < 'T' 35 1 "set_config" 0 #0 25 #65535 -1 #0
2019-03-03 07:24:54.144 UTC   PQgetResult end


> > But I still don't really see a need for different levels or whatever.
> > I mean, you either want a dump of all of the protocol traffic, or you
> > don't, I think.  Or maybe I am confused as to what the goal of all
> > this really is.
> 
> Yeah, me too.  But a lot of this detail would only be useful if you were trying
> to diagnose something like a discrepancy between the server and libpq as to
> the width of some field.  And the number of users for that can be counted
> without running out of fingers.  I think what would be of use for a trace
> facility is as high-level a view as possible of the message contents.
> 
> Or, in other words: a large part of the problem with the existing PQtrace
> facility is that it *was* designed to help debug libpq itself, and that
> use-case is no longer very interesting.  We should assume that the library
> knows how to parse protocol messages.

Since I explained the reason in the previous email, I am copy-pasting it again here.

I think the purpose of the leveling is to provide an optional information for the user, 
which is useful for diagnosis during the performance deterioration.
When query delay happens, we want to know from which side(server or client) is the cause of it,
and then people want to know which process takes time.
I think the phase and time information are useful for diagnosis.
For example, when command processing function (ex. PQexec()) etc. start/end
and when client receive/send protocol messages.

So is it alright to add these information to the new/proposed PQtrace() default output?



Regards,
Aya Iwata



Re: RE: libpq debug log

From
David Steele
Date:
On 3/5/19 11:48 AM, Iwata, Aya wrote:
> 
> So is it alright to add these information to the new/proposed PQtrace() default output?

I agree with Andres [1] that it's not very clear where this patch is  
going and we should push the target to PG13.

Regards,
-- 
-David
david@pgmasters.net

[1]  
https://www.postgresql.org/message-id/raw/20190214203752.t4hl574k6jlu4t25%40alap3.anarazel.de


Re: Re: RE: libpq debug log

From
David Steele
Date:
On 3/5/19 5:28 PM, David Steele wrote:
> On 3/5/19 11:48 AM, Iwata, Aya wrote:
>>
>> So is it alright to add these information to the new/proposed 
>> PQtrace() default output?
> 
> I agree with Andres [1] that it's not very clear where this patch is 
> going and we should push the target to PG13.

Hearing no opinions to the contrary, I have set the target version to PG13.

Regards,
-- 
-David
david@pgmasters.net


RE: libpq debug log

From
"Iwata, Aya"
Date:
Hi, 

> The basic idea being:
> 
> - Each line is a whole message.
> - The line begins with <<< for a message received and >>> for a message sent.
> - Strings in single quotes are those sent/received as a fixed number of bytes.
> - Strings in double quotes are those sent/received as a string.
> - 4-byte integers are printed unadorned.
> - 2-byte integers are prefixed by #.
> - I guess 1-byte integers would need some other prefix, maybe @ or ##.

I created v1 patch to improve PQtrace(); output log message in one line. 
Please find my attached patch.


Log output examples;

In current PQtrace log:

To backend> Msg Q
                   
 
To backend> "SELECT pg_catalog.set_config('search_path', '', false)"
To backend> Msg complete, length 60

I changed like this:

2019-04-04 02:39:51.488 UTC  > Query 59 "SELECT pg_catalog.set_config('search_path', '', false)" 

In current PQtrace log:

From backend> T
From backend (#4)> 35
From backend (#2)> 1
From backend> "set_config"
From backend (#4)> 0
From backend (#2)> 0
From backend (#4)> 25
From backend (#2)> 65535
From backend (#4)> -1
From backend (#2)> 0

I changed like this:

2019-04-04 02:39:51.489 UTC  < RowDescription 35 #1 "set_config" 0 #0 25 #65535 -1 #0


> I would like to add timestamp per line and when command processing function
> start/end.
> I think it is useful to know the application process start/end for diagnosis.
> So I will implement like this;
> 
> 2019-03-03 07:24:54.142 UTC   PQgetResult start
> 2019-03-03 07:24:54.143 UTC   < 'T' 35 1 "set_config" 0 #0 25 #65535 -1 #0
> 2019-03-03 07:24:54.144 UTC   PQgetResult end
I would like to add this in next patch if there are not any disagreement.

Regards,
Aya Iwata

Attachment

RE: libpq debug log

From
"Iwata, Aya"
Date:
Hi,

I update patch to improve PQtrace(); output log message in one line.
Please find my attached patch.

How it changed:
> > The basic idea being:
> >
> > - Each line is a whole message.
> > - The line begins with <<< for a message received and >>> for a message
> sent.
> > - Strings in single quotes are those sent/received as a fixed number of
> bytes.
> > - Strings in double quotes are those sent/received as a string.
> > - 4-byte integers are printed unadorned.
> > - 2-byte integers are prefixed by #.
> > - I guess 1-byte integers would need some other prefix, maybe @ or ##.

New log output examples:
The message sent from frontend is like this;
2019-04-04 02:39:51.488 UTC  > Query 59 "SELECT pg_catalog.set_config('search_path', '', false)" 

The message sent from backend is like this;
2019-04-04 02:39:51.489 UTC  < RowDescription 35 #1 "set_config" 0 #0 25 #65535 -1 #0


Regards,
Aya Iwata

Attachment

Re: libpq debug log

From
Kyotaro HORIGUCHI
Date:
Hello. Thank you for the new patch.

At Tue, 9 Apr 2019 06:19:32 +0000, "Iwata, Aya" <iwata.aya@jp.fujitsu.com> wrote in
<71E660EB361DF14299875B198D4CE5423DF161BA@g01jpexmbkw25>
> Hi,
> 
> I update patch to improve PQtrace(); output log message in one line.
> Please find my attached patch.
> 
> How it changed:
> > > The basic idea being:
> > >
> > > - Each line is a whole message.
> > > - The line begins with <<< for a message received and >>> for a message
> > sent.
> > > - Strings in single quotes are those sent/received as a fixed number of
> > bytes.
> > > - Strings in double quotes are those sent/received as a string.
> > > - 4-byte integers are printed unadorned.
> > > - 2-byte integers are prefixed by #.
> > > - I guess 1-byte integers would need some other prefix, maybe @ or ##.
> 
> New log output examples:
> The message sent from frontend is like this;
> 2019-04-04 02:39:51.488 UTC  > Query 59 "SELECT pg_catalog.set_config('search_path', '', false)" 
> 
> The message sent from backend is like this;
> 2019-04-04 02:39:51.489 UTC  < RowDescription 35 #1 "set_config" 0 #0 25 #65535 -1 #0

I had a brief look on this.

+/* protocol message name */
+static char *command_text_b[] = {

Couldn't the name be more descriptive? The comment just above
doesn't seem consistent with the variable.  The tables are very
sparse. I think the definition could be in more compact form.

+    /* y */ 0,
+    /* z */ 0
+};
+#define COMMAND_BF_MAX (sizeof(command_text_bf) / sizeof(*command_text_bf))

It seems that at least the trailing 0-elements are not required.


+ * message_get_command_text:
+ *     Get Protocol message text from byte1 identifier
+ */
+static char *
+message_get_command_text(unsigned char c, CommunicationDirection direction)
..
+message_nchar(PGconn *conn, const char *v, int length, CommunicationDirection direction)

Also the function names are not very descriptive. 

+message_Int(PGconn *conn, int v, int length, CommunicationDirection direct)

We are not using names mixing CamelCase and undercored there.


+    if (c >= 0 && c < COMMAND_BF_MAX)
+    {
+        text = command_text_bf[c];
+        if (text)
+            return text;
+    }
+
+    if (direction == FROM_BACKEND && c >= 0 && c < COMMAND_B_MAX)
+    {
+        text = command_text_b[c];
+        if (text)
..
+    if (direction == FROM_FRONTEND && c >= 0 && c < COMMAND_F_MAX)


This code is assuming that elements of command_text_bf is
mutually exclusive with command_text_b or _bf. That is, the first
has an element for 'C', others don't have an element in the same
position. But _bf[C] = "CommandComplete" and _f[C] = "Close". Is
it working correctly?

+typedef enum CommunicationDirection

The type CommunicationDirection is two-member enum which is
equivalent to just a boolean. Is there a reason you define that?

+typedef enum State
+typedef enum Type

The name is too generic.
+typedef struct _LoggingMsg
...
+} LoggingMsg;

Why the tag name is prefixed with an underscore?

+typedef struct _Frontend_Entry

The name doesn't give an idea of its characteristics.


regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




RE: libpq debug log

From
"Iwata, Aya"
Date:
Hi Horiguchi-san,

Thank you for your reviewing. 
I updated patch. Please see my attached patch.

> +/* protocol message name */
> +static char *command_text_b[] = {
> 
> Couldn't the name be more descriptive? The comment just above doesn't seem
> consistent with the variable.  The tables are very sparse. I think the
> definition could be in more compact form.
Thank you. I changed the description more clear.

> 
> +    /* y */ 0,
> +    /* z */ 0
> +};
> +#define COMMAND_BF_MAX (sizeof(command_text_bf) /
> +sizeof(*command_text_bf))
> 
> It seems that at least the trailing 0-elements are not required.
Sure. I removed.

> + * message_get_command_text:
> + *     Get Protocol message text from byte1 identifier
> + */
> +static char *
> +message_get_command_text(unsigned char c, CommunicationDirection
> +direction)
> ..
> +message_nchar(PGconn *conn, const char *v, int length,
> +CommunicationDirection direction)
> 
> Also the function names are not very descriptive.
Thank you. I fixed function names and added descriptions.

> 
> +message_Int(PGconn *conn, int v, int length, CommunicationDirection
> +direct)
> 
> We are not using names mixing CamelCase and undercored there.
> 
> 
> +    if (c >= 0 && c < COMMAND_BF_MAX)
> +    {
> +        text = command_text_bf[c];
> +        if (text)
> +            return text;
> +    }
> +
> +    if (direction == FROM_BACKEND && c >= 0 && c < COMMAND_B_MAX)
> +    {
> +        text = command_text_b[c];
> +        if (text)
> ..
> +    if (direction == FROM_FRONTEND && c >= 0 && c < COMMAND_F_MAX)
> 
> 
> This code is assuming that elements of command_text_bf is mutually exclusive
> with command_text_b or _bf. That is, the first has an element for 'C', others
> don't have an element in the same position. But _bf[C] = "CommandComplete"
> and _f[C] = "Close". Is it working correctly?
Elements sent from both the backend and the frontend are 'c' and 'd'.
There is no same elements in protocol_message_type_b and _bf.
The same applies to protocol_message_type_f and _bf too. So I think it is working correctly. 


> +typedef enum CommunicationDirection
> 
> The type CommunicationDirection is two-member enum which is equivalent to
> just a boolean. Is there a reason you define that?
> 
> +typedef enum State
> +typedef enum Type
> 
> The name is too generic.
> +typedef struct _LoggingMsg
> ...
> +} LoggingMsg;
> 
> Why the tag name is prefixed with an underscore?
> 
> +typedef struct _Frontend_Entry
> 
> The name doesn't give an idea of its characteristics.
Thank you. I fixed.

Regards,
Aya Iwata

Attachment

RE: libpq debug log

From
"Jamison, Kirk"
Date:
Hi Aya-san,

I tested your v3 patch and it seemed to work on my Linux environment.
However, the CF Patch Tester detected a build failure (probably on Windows).
Refer to: http://commitfest.cputube.org/

Docs:
It would be better to have reference to the documentations of 
Frontend/Backend Protocol's "Message Format".

Code:
There are some protocol message types from frontend
that you missed indicating (non BYTE1 types):
CancelRequest (F), StartupMessage (F), SSLRequest (F).

Although I haven't tested those actual protocols,
I assume it will be printed as the following since the length and
message will still be recognized.
ex. Timestamp 8 80877103

So you need to indicate these protocol message types as intended.
ex. Timestamp > SSLRequest 8 80877103


Regards,
Kirk Jamison




RE: libpq debug log

From
"Iwata, Aya"
Date:
Hi Kirk,

Thank you for your reviewing.

> Docs:
> It would be better to have reference to the documentations of Frontend/Backend
> Protocol's "Message Format".
I added a link to "Protocol's Message Formats" and little explanation to PQtrace() documentation.

> Code:
> There are some protocol message types from frontend that you missed indicating
> (non BYTE1 types):
> CancelRequest (F), StartupMessage (F), SSLRequest (F).
> 
> Although I haven't tested those actual protocols, I assume it will be printed
> as the following since the length and message will still be recognized.
> ex. Timestamp 8 80877103
> 
> So you need to indicate these protocol message types as intended.
> ex. Timestamp > SSLRequest 8 80877103
Thank you. I changed code to output these information.
For that, I added code to check the int32 content which  StartupMessage (F) and SSLRequest (F) have.
And CancelRequest is not targeted because it calls send() directly.

Regards,
Aya Iwata

Attachment

RE: libpq debug log

From
"Iwata, Aya"
Date:
Hi, 

I rebased my patch from head. Please find my attached patch.

Regard,
Aya Iwata

Attachment

RE: libpq debug log

From
"Iwata, Aya"
Date:
Hi, 

This is a summary of the whole thread.
I am currently improving PQtrace() by adjusting its output to one line per protocol message as per the advice of
reviewers.

Purpose:
If a problem occurs, such as a slow query, you want to know which query takes time.
In Current libpq, there is PQtrace(FILE *filename) facility to output exchanging protocol messages to file.
But I think current PQtrace() has following issues: 
* The output is confusing. It is difficult to analyze information because it is printed one by one, and only a
characterrepresenting a message (ex. printed 'T' means RowDescription).
 
* Timestamp is not output. So we cannot identify which process took a long time. That would be possible when we compare
timestamps.
* PQtrace() code must be included in libpq application's source code. If you want to get log, you should change code
andre-compile it for logging. Some application cannot do this.
 

Compared to tcpdump:
There is tcpdump for similar use, but it has the following problems:
- Windows users cannot use it.
- If the communication is encrypted, it is possible that you may not see the information you want as explained by
Andres.
- Information can only be retrieved by limited users due to OS permissions.

Solution:
Work on following improvements in order:
1. Adjusting it to emit one line per protocol message and output timestamp.
2. Enables logging control without recompiling the application. 
   I thought it would be better to control it with parameters. However since this method is controversial (Security
implications,etc.), we will consider a good method after completing 1.   
 

Latest patch just contains 1. Hence, the usage of this feature is the same as current PQtrace().

Example of log output:
In current PQtrace log:

To backend> Msg Q
                   
 
To backend> "SELECT pg_catalog.set_config('search_path', '', false)"
To backend> Msg complete, length 60

I changed like this:

2019-04-04 02:39:51.488 UTC  > Query 59 "SELECT pg_catalog.set_config('search_path', '', false)"

I appreciate your advice regarding the one line protocol message. Thank you in advance.

Regards,
Aya Iwata

Attachment

Re: libpq debug log

From
Alvaro Herrera from 2ndQuadrant
Date:
Hello,

Nice patch.  I think there's pretty near consensus that this is
something we want, and that the output format of one trace msg per libpq
msg is roughly okay.  (I'm not sure there's 100% agreement on this last
point, but it seems close enough.)

> I changed like this:
> 
> 2019-04-04 02:39:51.488 UTC  > Query 59 "SELECT pg_catalog.set_config('search_path', '', false)"

The "59" there seems quite odd, though.

* What is this in pg_conn?  I don't understand why this array is of size
  MAXPGPATH.
+    PGFrontendLogMsgEntry frontend_entry[MAXPGPATH];
  This seems to make pg_conn much larger than we'd like.

Minor review points:

* Do we need COMMAND_B_MIN etc to reduce the number of zeroes at the
beginning of the tables?

* The place where you define TRACELOG_TIME_SIZE seems like it'll be
  unavoidably out of date if somebody changes the format string.  I'd put
  that #define next to where the fmt appears.

* "output a a null-terminated" has duplicated "a"

* We don't add braces around single-statement blocks (pqPutMsgEnd)

* Please pgindent.


Thanks

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



RE: libpq debug log

From
"iwata.aya@fujitsu.com"
Date:
Hello,

Thank you for your review.
I update patch. Please find attached my patch.


> > 2019-04-04 02:39:51.488 UTC  > Query 59 "SELECT
> pg_catalog.set_config('search_path', '', false)"
> 
> The "59" there seems quite odd, though.
Could you explain more detail about this?

"59" is length of protocol message contents. (It does not contain first 1 byte.) 
This order is based on the message format.
https://www.postgresql.org/docs/current/protocol-message-formats.html


> * What is this in pg_conn?  I don't understand why this array is of size
>   MAXPGPATH.
> +    PGFrontendLogMsgEntry frontend_entry[MAXPGPATH];
>   This seems to make pg_conn much larger than we'd like.
Sure. I remove this and change code to use list.

> Minor review points:
I accept all these review points.

Regards,
Aya Iwata

Attachment

Re: libpq debug log

From
Alvaro Herrera
Date:
Hello Aya Iwata,

I like this patch, and I think we should have it.  I have updated it, as
it had conflicts.

In 0002, I remove use of ftime(), because that function is obsolete.
However, with that change we lose printing of milliseconds in the trace
lines.  I think that's not great, but I also think we can live without
them until somebody gets motivated to write the code for that.  It seems
a little messy so I'd prefer to leave that for a subsequent commit.
(Maybe we can just use pg_strftime.)

Looking at the message type tables, I decided to get rid of the "bf"
table, which had only two bytes, and instead put CopyData and CopyDone
in the other two tables.  That seems simpler.  Also, the COMMAND_x_MAX
macros were a bit pointless; I just expanded at the only caller of each,
using lengthof().  And since the message type is unsigned, there's no
need to do "c >= 0" since it must always be true.

I added setlinebuf() to the debug file.  Works better than without,
because "tail -f /tmp/libpqtrace.log" works as you'd expect.

One thing that it might be good to do is to avoid creating the message
type tables as const but instead initialize them if and only if tracing
is enabled, on the first call.  I think that would not only save space
in the constant area of the library for the 99.99% of the cases where
tracing isn't used, but also make the initialization code be more
sensible (since presumably you wouldn't have to initialize all the
zeroes.)

Opinions?  I experimented by patching psql as below (not intended for
commit) and it looks good.  Only ErrorResponse prints the terminator as
a control character, or something:

2020-09-16 13:27:29.072 -03  < ErrorResponse 117 S "ERROR" V "ERROR" C "42704" M "no existe el slot de replicación
«foobar»"F "slot.c" L "408" R "ReplicationSlotAcquireInternal" ^@ 
 


diff --git a/src/bin/psql/startup.c b/src/bin/psql/startup.c
index 8232a0143b..01728ba8e8 100644
--- a/src/bin/psql/startup.c
+++ b/src/bin/psql/startup.c
@@ -301,6 +301,11 @@ main(int argc, char *argv[])
 
     psql_setup_cancel_handler();
 
+    {
+        FILE *trace = fopen("/tmp/libpqtrace.log", "a+");
+        PQtrace(pset.db, trace);
+    }
+
     PQsetNoticeProcessor(pset.db, NoticeProcessor, NULL);
 
     SyncVariables();

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachment

RE: libpq debug log

From
"iwata.aya@fujitsu.com"
Date:
Hi Alvaro san, 

Thank you for your update :)

> Opinions?  I experimented by patching psql as below (not intended for
> commit) and it looks good.  Only ErrorResponse prints the terminator as a
> control character, or something:
I check code, changes and email. I agree with all of this. 
I will review code, feature and performance if it is needed more closely. 
(I'll start it next week.)

Regards,
Aya Iwata

> -----Original Message-----
> From: Alvaro Herrera <alvherre@2ndquadrant.com>
> Sent: Thursday, September 17, 2020 5:42 AM
> To: Iwata, Aya/岩田 彩 <iwata.aya@fujitsu.com>
> Cc: pgsql-hackers@postgresql.org; tgl@sss.pgh.pa.us;
> robertmhaas@gmail.com; pchampion@pivotal.io; jdoty@pivotal.io;
> raam.soft@gmail.com; Nagaura, Ryohei/永浦 良平
> <nagaura.ryohei@fujitsu.com>; nagata@sraoss.co.jp;
> peter.eisentraut@2ndquadrant.com; 'Kyotaro HORIGUCHI'
> <horiguchi.kyotaro@lab.ntt.co.jp>; Jamison, Kirk/ジャミソン カーク
> <k.jamison@fujitsu.com>
> Subject: Re: libpq debug log
> 
> Hello Aya Iwata,
> 
> I like this patch, and I think we should have it.  I have updated it, as it had
> conflicts.
> 
> In 0002, I remove use of ftime(), because that function is obsolete.
> However, with that change we lose printing of milliseconds in the trace lines.
> I think that's not great, but I also think we can live without them until
> somebody gets motivated to write the code for that.  It seems a little messy
> so I'd prefer to leave that for a subsequent commit.
> (Maybe we can just use pg_strftime.)
> 
> Looking at the message type tables, I decided to get rid of the "bf"
> table, which had only two bytes, and instead put CopyData and CopyDone in
> the other two tables.  That seems simpler.  Also, the COMMAND_x_MAX
> macros were a bit pointless; I just expanded at the only caller of each, using
> lengthof().  And since the message type is unsigned, there's no need to do "c
> >= 0" since it must always be true.
> 
> I added setlinebuf() to the debug file.  Works better than without, because
> "tail -f /tmp/libpqtrace.log" works as you'd expect.
> 
> One thing that it might be good to do is to avoid creating the message type
> tables as const but instead initialize them if and only if tracing is enabled, on
> the first call.  I think that would not only save space in the constant area of
> the library for the 99.99% of the cases where tracing isn't used, but also make
> the initialization code be more sensible (since presumably you wouldn't have
> to initialize all the
> zeroes.)
> 
> Opinions?  I experimented by patching psql as below (not intended for
> commit) and it looks good.  Only ErrorResponse prints the terminator as a
> control character, or something:
> 
> 2020-09-16 13:27:29.072 -03  < ErrorResponse 117 S "ERROR" V "ERROR" C
> "42704" M "no existe el slot de replicación «foobar»" F "slot.c" L "408" R
> "ReplicationSlotAcquireInternal" ^@
> 
> 
> diff --git a/src/bin/psql/startup.c b/src/bin/psql/startup.c index
> 8232a0143b..01728ba8e8 100644
> --- a/src/bin/psql/startup.c
> +++ b/src/bin/psql/startup.c
> @@ -301,6 +301,11 @@ main(int argc, char *argv[])
> 
>      psql_setup_cancel_handler();
> 
> +    {
> +        FILE *trace = fopen("/tmp/libpqtrace.log", "a+");
> +        PQtrace(pset.db, trace);
> +    }
> +
>      PQsetNoticeProcessor(pset.db, NoticeProcessor, NULL);
> 
>      SyncVariables();
> 
> --
> Álvaro Herrera                https://www.2ndQuadrant.com/
> PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: libpq debug log

From
Kyotaro Horiguchi
Date:
At Wed, 16 Sep 2020 17:41:55 -0300, Alvaro Herrera <alvherre@2ndquadrant.com> wrote in
> Hello Aya Iwata,
>
> I like this patch, and I think we should have it.  I have updated it, as
> it had conflicts.
>
> In 0002, I remove use of ftime(), because that function is obsolete.
> However, with that change we lose printing of milliseconds in the trace
> lines.  I think that's not great, but I also think we can live without
> them until somebody gets motivated to write the code for that.  It seems
> a little messy so I'd prefer to leave that for a subsequent commit.
> (Maybe we can just use pg_strftime.)
>
> Looking at the message type tables, I decided to get rid of the "bf"
> table, which had only two bytes, and instead put CopyData and CopyDone
> in the other two tables.  That seems simpler.  Also, the COMMAND_x_MAX
> macros were a bit pointless; I just expanded at the only caller of each,
> using lengthof().  And since the message type is unsigned, there's no
> need to do "c >= 0" since it must always be true.
>
> I added setlinebuf() to the debug file.  Works better than without,
> because "tail -f /tmp/libpqtrace.log" works as you'd expect.
>
> One thing that it might be good to do is to avoid creating the message
> type tables as const but instead initialize them if and only if tracing
> is enabled, on the first call.  I think that would not only save space
> in the constant area of the library for the 99.99% of the cases where
> tracing isn't used, but also make the initialization code be more
> sensible (since presumably you wouldn't have to initialize all the
> zeroes.)
>
> Opinions?  I experimented by patching psql as below (not intended for
> commit) and it looks good.  Only ErrorResponse prints the terminator as
> a control character, or something:
>
> 2020-09-16 13:27:29.072 -03  < ErrorResponse 117 S "ERROR" V "ERROR" C "42704" M "no existe el slot de replicación
«foobar»"F "slot.c" L "408" R "ReplicationSlotAcquireInternal" ^@  
>
>
> diff --git a/src/bin/psql/startup.c b/src/bin/psql/startup.c
> index 8232a0143b..01728ba8e8 100644
> --- a/src/bin/psql/startup.c
> +++ b/src/bin/psql/startup.c
> @@ -301,6 +301,11 @@ main(int argc, char *argv[])
>
>      psql_setup_cancel_handler();
>
> +    {
> +        FILE *trace = fopen("/tmp/libpqtrace.log", "a+");
> +        PQtrace(pset.db, trace);
> +    }
> +
>      PQsetNoticeProcessor(pset.db, NoticeProcessor, NULL);
>
>      SyncVariables();

I saw that version and have some comments.

+pqGetProtocolMsgType(unsigned char c, PGCommSource commsource)
+{
+    const char *message_type;

Compiler complains that this is unused.

+static const char *
+pqGetProtocolMsgType(unsigned char c, PGCommSource commsource)
+{
...
+    else
+        return "UnknownCommand";
+}

Compiler complains as "control reached end of non-void function"


+pqLogMsgString(PGconn *conn, const char *v, int length, PGCommSource commsource)
+{
+    if (length < 0)
+        length = strlen(v) + 1;
+
+    if (PG_PROTOCOL_MAJOR(conn->pversion) >= 3)
+    {
+        switch (conn->logging_message.state)
+        {
+            case LOG_CONTENTS:
+                fprintf(conn->Pfdebug, "\"%s\" ", v);
+                pqLogLineBreak(length, conn);

pqLogMsgString(conn, str, -1, FROM_*) means actual length may be
different from the caller thinks, but the pqLogLineBreak() subtracts
that value from the message length rememberd in in logging_message.
Anyway AFAICS the patch doesn't use the code path so we should remove
the first two lines.

+            case LOG_CONTENTS:
+                fprintf(conn->Pfdebug, "%s%d ", prefix, v);

That prints #65535 for v = -1 and length = 2. I think it should be
properly expanded as a signed integer.


@@ -139,8 +447,7 @@ pqGets_internal(PQExpBuffer buf, PGconn *conn, bool resetbuffer)
     conn->inCursor = ++inCursor;

     if (conn->Pfdebug)
-        fprintf(conn->Pfdebug, "From backend> \"%s\"\n",
-                buf->data);
+        pqLogMsgString(conn, buf->data, buf->len + 1, FROM_BACKEND);

By the way, appendBinaryPQExpBuffer() enlarges its buffer by the size
of the exact length of the given data, but appends '\0' at the end of
the copied data. Couldn't that leads to an memory overrun?


regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center



Re: libpq debug log

From
Alvaro Herrera
Date:
On 2020-Oct-09, Kyotaro Horiguchi wrote:

> I saw that version and have some comments.
> 
> +pqGetProtocolMsgType(unsigned char c, PGCommSource commsource)
> +{
> +    const char *message_type;
> 
> Compiler complains that this is unused.
> 
> +static const char *
> +pqGetProtocolMsgType(unsigned char c, PGCommSource commsource)
> +{
> ...
> +    else
> +        return "UnknownCommand";
> +}
> 
> Compiler complains as "control reached end of non-void function"

Yeah, those two warnings are caused by the same problem, namely that I
was editing this function to make it simpler and apparently the patch
version I sent does not include all such changes.  The fix is to remove
the message_type variable and have the two assignments be "return".

> +pqLogMsgString(PGconn *conn, const char *v, int length, PGCommSource commsource)
> +{
> +    if (length < 0)
> +        length = strlen(v) + 1;
> +

> pqLogMsgString(conn, str, -1, FROM_*) means actual length may be
> different from the caller thinks, but the pqLogLineBreak() subtracts
> that value from the message length rememberd in in logging_message.
> Anyway AFAICS the patch doesn't use the code path so we should remove
> the first two lines.

True, +1 for removing it.

> By the way, appendBinaryPQExpBuffer() enlarges its buffer by the size
> of the exact length of the given data, but appends '\0' at the end of
> the copied data. Couldn't that leads to an memory overrun?

Doesn't enlargePQExpBuffer() include room for the trailing zero?  I
think it does.



Re: libpq debug log

From
Robert Haas
Date:
On Wed, Sep 16, 2020 at 4:41 PM Alvaro Herrera <alvherre@2ndquadrant.com> wrote:
> 2020-09-16 13:27:29.072 -03  < ErrorResponse 117 S "ERROR" V "ERROR" C "42704" M "no existe el slot de replicación
«foobar»"F "slot.c" L "408" R "ReplicationSlotAcquireInternal" ^@ 

Ooh, that looks really nice. The ^@ at the end seems odd, though.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: libpq debug log

From
Kyotaro Horiguchi
Date:
At Fri, 9 Oct 2020 11:48:59 -0300, Alvaro Herrera <alvherre@alvh.no-ip.org> wrote in 
> > +pqLogMsgString(PGconn *conn, const char *v, int length, PGCommSource commsource)
> > +{
> > +    if (length < 0)
> > +        length = strlen(v) + 1;
> > +
> 
> > pqLogMsgString(conn, str, -1, FROM_*) means actual length may be
> > different from the caller thinks, but the pqLogLineBreak() subtracts
> > that value from the message length rememberd in in logging_message.
> > Anyway AFAICS the patch doesn't use the code path so we should remove
> > the first two lines.
> 
> True, +1 for removing it.
> 
> > By the way, appendBinaryPQExpBuffer() enlarges its buffer by the size
> > of the exact length of the given data, but appends '\0' at the end of
> > the copied data. Couldn't that leads to an memory overrun?
> 
> Doesn't enlargePQExpBuffer() include room for the trailing zero?  I
> think it does.

Right. I faintly recall I said the same thing before..

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



Re: libpq debug log

From
Alvaro Herrera
Date:
Hello Aya Iwata

I've been hacking at this patch again.  There were a few things I wasn't
too clear about, so I reordered the code and renamed the routines to try
to make it easier to follow.

One thing I didn't like very much is that all the structures and enums
were exposed to the world in libq-int.h.  Because some enum members have
pretty generic names, I didn't like that much, so I moved the whole
thing to fe-misc.c, and renamed the structs.  Also, the arrays don't
take space unless PQtrace() is called.  (This is not what I was talking
about in my previous message.  The idea I was trying to explain in my
previous message cannot possibly work, so I abandoned it.)

I also renamed functions to make it clear which handles frontend and
which handles backend.  With that, it was pretty obvious that we had an
"reset BE message" in the routine to handle FE, and some clearing of FE
in the code that handles BE.  I fixed things in a way that I think makes
the most sense.

I noticed that it's theoretically possible to have a FE message so large
(more than MAXPGPATH pieces) that it would overrun the array; so I added
a "print message" call after adding one piece, to avoid this.  Also, why
MAXPGPATH?  I added a new symbol MAX_FRONTEND_MSGS for this purpose.

There are some things still to do:

0. I added a XXX comment to pqFlush.  Because we're storing messages in
fe_msgs that point to the libpq buffer, is it possible to end up with
trace messages that are pointing into outBuffer bytes that are already
sent, and perhaps even overwritten with newer bytes?  Maybe not, but
it's unclear.  Should we do pqLogFrontendMsg() preventively to avoid
this problem?

1. Is the handling of protocol 2 correct?  Since it's completely
separate from protocol 3, I have not even looked at what it produces.
But the fact that pqLogMsgByte1 completely ignores the "commsource"
argument makes me suspect that it's not correct.
1a. How do we even test protocol 2 handling?

2. We need a mode to suppress print of time; this would be useful to
be able to test libpq at a low level.  Maybe instead of time we can
print a monotonically-increasing packet sequence number.  With this, we
could easily add tests for libpq itself.  What user interface do we want
for this?  Maybe we can add an "bits32 flags" parameter to PQtrace(),
with one bit for this use.

3. COPY ... (FORMAT BINARY) emits "invalid protocol" ... not good.

4. Even in text format, COPY output is not very useful.  How can we
improve that?

5. Error messages are still printing the terminating zero byte.  I
suggest that it should be suppressed.

6. Let's redefine pqTraceMaybeBreakLine() (I renamed from
pqLogLineBreak):  If message is complete, print a newline; if message is
not complete, print a " ".  Then, remove the whitespace after printing
each element.  This should lead to lines that don't have an useless " "
at the end.

7. Why does it make sense to call pqTraceMaybeBreakLine when
commsource=FROM_FRONTEND?

Attachment

RE: libpq debug log

From
"iwata.aya@fujitsu.com"
Date:
Hi Alvaro san,

Thank you for your email.
I will review this updated patch and I will let you know when I complete.
Please wait a moment.

Best regards,
Aya Iwata

> -----Original Message-----
> From: Alvaro Herrera <alvherre@alvh.no-ip.org>
> Sent: Tuesday, October 27, 2020 1:23 AM
> To: Iwata, Aya/岩田 彩 <iwata.aya@fujitsu.com>
> Cc: pgsql-hackers@postgresql.org; tgl@sss.pgh.pa.us;
> robertmhaas@gmail.com; pchampion@pivotal.io; jdoty@pivotal.io;
> raam.soft@gmail.com; Nagaura, Ryohei/永浦 良平
> <nagaura.ryohei@fujitsu.com>; nagata@sraoss.co.jp;
> peter.eisentraut@2ndquadrant.com; 'Kyotaro HORIGUCHI'
> <horiguchi.kyotaro@lab.ntt.co.jp>; Jamison, Kirk/ジャミソン カーク
> <k.jamison@fujitsu.com>
> Subject: Re: libpq debug log
>
> Hello Aya Iwata
>
> I've been hacking at this patch again.  There were a few things I wasn't too
> clear about, so I reordered the code and renamed the routines to try to make it
> easier to follow.
>
> One thing I didn't like very much is that all the structures and enums were
> exposed to the world in libq-int.h.  Because some enum members have
> pretty generic names, I didn't like that much, so I moved the whole thing to
> fe-misc.c, and renamed the structs.  Also, the arrays don't take space unless
> PQtrace() is called.  (This is not what I was talking about in my previous
> message.  The idea I was trying to explain in my previous message cannot
> possibly work, so I abandoned it.)
>
> I also renamed functions to make it clear which handles frontend and which
> handles backend.  With that, it was pretty obvious that we had an "reset BE
> message" in the routine to handle FE, and some clearing of FE in the code that
> handles BE.  I fixed things in a way that I think makes the most sense.
>
> I noticed that it's theoretically possible to have a FE message so large (more
> than MAXPGPATH pieces) that it would overrun the array; so I added a "print
> message" call after adding one piece, to avoid this.  Also, why MAXPGPATH?
> I added a new symbol MAX_FRONTEND_MSGS for this purpose.
>
> There are some things still to do:
>
> 0. I added a XXX comment to pqFlush.  Because we're storing messages in
> fe_msgs that point to the libpq buffer, is it possible to end up with trace
> messages that are pointing into outBuffer bytes that are already sent, and
> perhaps even overwritten with newer bytes?  Maybe not, but it's unclear.
> Should we do pqLogFrontendMsg() preventively to avoid this problem?
>
> 1. Is the handling of protocol 2 correct?  Since it's completely separate from
> protocol 3, I have not even looked at what it produces.
> But the fact that pqLogMsgByte1 completely ignores the "commsource"
> argument makes me suspect that it's not correct.
> 1a. How do we even test protocol 2 handling?
>
> 2. We need a mode to suppress print of time; this would be useful to be able to
> test libpq at a low level.  Maybe instead of time we can print a
> monotonically-increasing packet sequence number.  With this, we could
> easily add tests for libpq itself.  What user interface do we want for this?
> Maybe we can add an "bits32 flags" parameter to PQtrace(), with one bit for
> this use.
>
> 3. COPY ... (FORMAT BINARY) emits "invalid protocol" ... not good.
>
> 4. Even in text format, COPY output is not very useful.  How can we improve
> that?
>
> 5. Error messages are still printing the terminating zero byte.  I suggest that
> it should be suppressed.
>
> 6. Let's redefine pqTraceMaybeBreakLine() (I renamed from
> pqLogLineBreak):  If message is complete, print a newline; if message is not
> complete, print a " ".  Then, remove the whitespace after printing each
> element.  This should lead to lines that don't have an useless " "
> at the end.
>
> 7. Why does it make sense to call pqTraceMaybeBreakLine when
> commsource=FROM_FRONTEND?



Re: libpq debug log

From
Greg Nancarrow
Date:
On Tue, Oct 27, 2020 at 3:23 AM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
>
> I've been hacking at this patch again.  There were a few things I wasn't
> too clear about, so I reordered the code and renamed the routines to try
> to make it easier to follow.
>

Hi,

Hopefully Iwata-san will return to looking at this soon.

I have tried the latest patch a little (using "psql" as my client),
and have a few small comments so far:

- In pqTraceInit(), in the (admittedly rare) case that fe_msg malloc()
fails, I'd NULL out be_msg too after free(), rather than leave it
dangling (because if pgTraceInit() was ever invoked again, as the
comment says it could, it would result in previously freed memory
being accessed ...)

conn->fe_msg = malloc(MAX_FRONTEND_MSGS * sizeof(pqFrontendMessage));
if (conn->fe_msg == NULL)
{
    free(conn->be_msg);
    conn->be_msg = NULL;
    return false;
}

- >3. COPY ... (FORMAT BINARY) emits "invalid protocol" ... not good.

That seemed to happen for me only if COPYing binary format to stdout.

    UnknownCommand :::Invalid Protocol

- >5. Error messages are still printing the terminating zero byte.  I
  >suggest that it should be suppressed.

Perhaps there's a more elegant way of doing it, but I got rid of the
logging of the zero byte using the following change to
pgLogMsgByte1(), though there still seems to be a trailing space
issue:

                        case LOG_CONTENTS:
-                               fprintf(conn->Pfdebug, "%c ", v);
+                               if (v != '\0')
+                                   fprintf(conn->Pfdebug, "%c ", v);
                                pqTraceMaybeBreakLine(sizeof(v), conn);
                                break;

Regards,
Greg Nancarrow
Fujitsu Australia



RE: libpq debug log

From
"iwata.aya@fujitsu.com"
Date:
Hi Alvaro san,


> There are some things still to do:
I worked on some to do.

> 1. Is the handling of protocol 2 correct?
Protocol 3.0 is implemented in PostgreSQL v7.4 or later. Therefore, most servers and clients today want to connect
using3.0.  
Also, wishful thinking, I think Protocol 2.0 will no longer be supported. So I didn't develop it aggressively.
Another reason I'm concerned about implementing it would make the code less maintainable.

> 5. Error messages are still printing the terminating zero byte. I
> suggest that it should be suppressed.
I suppressed to print terminating zero byte like this;
2020-12-15 00:54:09 UTC < ErrorResponse 100 S "ERROR" V "ERROR" C "42P01" M "relation "a" does not exist" P "15" F
"parse_relation.c"L "1379" R "parserOpenTable" 0 

I thought about not outputting it, but the document said that if zero,
 it was the last message, so I made it output "0".

> 6. Let's redefine pqTraceMaybeBreakLine() (I renamed from
> pqLogLineBreak):
Sure. I redefined this function.

> 7. Why does it make sense to call pqTraceMaybeBreakLine when
> commsource=FROM_FRONTEND?
Backend messages include break line. However frontend messages don’t have it. So I call this functions.

Pending lists.
>  0. XXX comments
There were 4 XXX comments in Alvaro san's patch. 3 XXX comments are left in current patch.
I will answer this in next e-mail.

> 2. We need a mode to suppress print of time;
> 3. COPY ... (FORMAT BINARY) emits "invalid protocol" ... not good.
> 4. Even in text format, COPY output is not very useful.  How can we
improve that?


Regards,
Aya Iwata
Fujitsu

Attachment

RE: libpq debug log

From
"k.jamison@fujitsu.com"
Date:
On Tuesday, December 15, 2020 8:12 PM, Iwata-san wrote:
> > There are some things still to do:
> I worked on some to do.

Hi Iwata-san,

Thank you for updating the patch.
I would recommend to register this patch in the upcoming commitfest
to help us keep track of it. I will follow the thread to provide more
reviews too.

> > 1. Is the handling of protocol 2 correct?
> Protocol 3.0 is implemented in PostgreSQL v7.4 or later. Therefore, most
> servers and clients today want to connect using 3.0.
> Also, wishful thinking, I think Protocol 2.0 will no longer be supported. So I
> didn't develop it aggressively.
> Another reason I'm concerned about implementing it would make the code
> less maintainable.

Though I have also not tested it with 2.0 server yet, do I have to download 7.3
version to test that isn't it? Silly question, do we still want to have this
feature for 2.0?
I understand that protocol 2.0 is still supported, but it is only used for
PostgreSQL versions 7.3 and earlier, which is not updated by fixes anymore
since we only backpatch up to previous 5 versions. However I am not sure if
it's a good idea, but how about if we just support this feature for protocol 3.
There are similar chunks of code in fe-exec.c, PQsendPrepare(), PQsendDescribe(),
etc. that already do something similar, so I just thought in hindsight if we can
do the same.
    if (PG_PROTOCOL_MAJOR(conn->pversion) >= 3)
    {
        <new code here>
    }
    else
    {
        /* Protocol 2.0 isn't supported */
         printfPQExpBuffer(&conn->errorMessage,
                           libpq_gettext("function requires at least protocol version 3.0\n"));
         return 0;
    }

But if it's necessary to provide this improved trace output for 2.0 servers, then ignore what
I suggested above, and although difficult I think we should test for protocol 2.0 in older server.

Some minor code comments I noticed:
1.
+    LOG_FIRST_BYTE,                /* logging the first byte identifing the
+                                 * protocol message type */

"identifing" should be "identifying". And closing */ should be on 3rd line.

2.
+            case LOG_CONTENTS:
+                /* Suppresses printing terminating zero byte */

--> Suppress printing of terminating zero byte

Regards,
Kirk Jamison



RE: libpq debug log

From
"tsunakawa.takay@fujitsu.com"
Date:
From: k.jamison@fujitsu.com <k.jamison@fujitsu.com>
> I understand that protocol 2.0 is still supported, but it is only used for
> PostgreSQL versions 7.3 and earlier, which is not updated by fixes anymore
> since we only backpatch up to previous 5 versions. However I am not sure if
> it's a good idea, but how about if we just support this feature for protocol 3.

+1
I thought psqlODBC (still) allows the user to choose the protocol version, it looks like the current psqlODBC disallows
pre-3protocol: 

[libpq_connect()]
    MYLOG(0, "libpq connection to the database established.\n");
    pversion = PQprotocolVersion(pqconn);
    if (pversion < 3)
    {
        MYLOG(0, "Protocol version %d is not supported\n", pversion);
        goto cleanup;
    }


> There are similar chunks of code in fe-exec.c, PQsendPrepare(),
> PQsendDescribe(),
> etc. that already do something similar, so I just thought in hindsight if we can
> do the same.
>     if (PG_PROTOCOL_MAJOR(conn->pversion) >= 3)
>     {
>         <new code here>
>     }
>     else
>     {
>         /* Protocol 2.0 isn't supported */
>          printfPQExpBuffer(&conn->errorMessage,
>                            libpq_gettext("function requires at least protocol
> version 3.0\n"));
>          return 0;
>     }


I haven't looked at the patch and don't know the code structure, but I want the code clutter to be minimized.  So, I
thinkwe should avoid putting if statements like above here and there. 

Plus, I don't think it's not necessary to fail the processing when the protocol version is 2; we can just stop the
traceoutput.  So, how about disabling the trace output silently if the protocol turns out to be < 3 upon connection? 


Regards
Takayuki Tsunakawa






Re: libpq debug log

From
Masahiko Sawada
Date:
Iwata-san,

On Mon, Dec 21, 2020 at 5:20 PM k.jamison@fujitsu.com
<k.jamison@fujitsu.com> wrote:
>
> On Tuesday, December 15, 2020 8:12 PM, Iwata-san wrote:
> > > There are some things still to do:
> > I worked on some to do.
>
> Hi Iwata-san,
>
> Thank you for updating the patch.
> I would recommend to register this patch in the upcoming commitfest
> to help us keep track of it. I will follow the thread to provide more
> reviews too.
>
> > > 1. Is the handling of protocol 2 correct?
> > Protocol 3.0 is implemented in PostgreSQL v7.4 or later. Therefore, most
> > servers and clients today want to connect using 3.0.
> > Also, wishful thinking, I think Protocol 2.0 will no longer be supported. So I
> > didn't develop it aggressively.
> > Another reason I'm concerned about implementing it would make the code
> > less maintainable.
>
> Though I have also not tested it with 2.0 server yet, do I have to download 7.3
> version to test that isn't it? Silly question, do we still want to have this
> feature for 2.0?
> I understand that protocol 2.0 is still supported, but it is only used for
> PostgreSQL versions 7.3 and earlier, which is not updated by fixes anymore
> since we only backpatch up to previous 5 versions. However I am not sure if
> it's a good idea, but how about if we just support this feature for protocol 3.
> There are similar chunks of code in fe-exec.c, PQsendPrepare(), PQsendDescribe(),
> etc. that already do something similar, so I just thought in hindsight if we can
> do the same.
>         if (PG_PROTOCOL_MAJOR(conn->pversion) >= 3)
>         {
>         <new code here>
>     }
>     else
>     {
>         /* Protocol 2.0 isn't supported */
>          printfPQExpBuffer(&conn->errorMessage,
>                            libpq_gettext("function requires at least protocol version 3.0\n"));
>          return 0;
>     }
>
> But if it's necessary to provide this improved trace output for 2.0 servers, then ignore what
> I suggested above, and although difficult I think we should test for protocol 2.0 in older server.
>
> Some minor code comments I noticed:
> 1.
> +       LOG_FIRST_BYTE,                         /* logging the first byte identifing the
> +                                                                * protocol message type */
>
> "identifing" should be "identifying". And closing */ should be on 3rd line.
>
> 2.
> +                       case LOG_CONTENTS:
> +                               /* Suppresses printing terminating zero byte */
>
> --> Suppress printing of terminating zero byte
>

The patch got some review comments a couple weeks ago but the patch
was still marked as "needs review", which was incorrect. Also cfbot[1]
is unhappy with this patch. So I'm switching the patch as "waiting on
author".

Regards,

[1] http://commitfest.cputube.org/


--
Masahiko Sawada
EnterpriseDB:  https://www.enterprisedb.com/



RE: libpq debug log

From
"matsumura.ryo@fujitsu.com"
Date:
Hi Iwata-san

I reviewed v10-0001-libpq-trace.patch. (But I don't check recent discussion...)
I found some bugs.
I'm suggesting some refactoring.

****
@@ -6809,7 +6809,17 @@ PQtrace(PGconn *conn, FILE *debug_port)
+       if (pqTraceInit(conn))
+       {
+               conn->Pfdebug = debug_port;
+               setlinebuf(conn->Pfdebug);

  If debug_port is NULL, setlinebuf() causes segmentation fault.
  We should have policy that libpq-trace framework works only when
  Pfdebug is not NULL.

  For example, we should also think that PQtrace(conn, NULL) has same
  effect as PQuntrace(conn).


****
+       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,    /* \x65 ... \0x6d */

  There should be 9 zero, but 15 zero.


****
@@ -85,7 +228,7 @@ pqGetc(char *result, PGconn *conn)
    if (conn->Pfdebug)
-       fprintf(conn->Pfdebug, "From backend> %c\n", *result);
+       pqLogMsgByte1(conn, *result, FROM_BACKEND);

  I suggest to move confirming Pfdebug to pqLogMsgByte1() and
  other logging functions. If you want to avoid overhead of calling
  function, use macro function like the following:

    #define pqLogMsgByte1(CONN, CH, SOURCE) \
      ((CONN)->Pfdebug ? pqLogMsgByte1(CONN, CH, SOURCE) : 0)


****
@@ -168,7 +310,7 @@ pqPuts(const char *s, PGconn *conn)
    if (pqPutMsgBytes(s, strlen(s) + 1, conn))
        return EOF;
    if (conn->Pfdebug)
-       fprintf(conn->Pfdebug, "To backend> \"%s\"\n", s);
+       pqStoreFrontendMsg(conn, LOG_STRING, strlen(s) + 1);

  It's better that you store strlen(s) to local variable and use it.
  strlen(s) is not cost-free.


****
@@ -301,11 +431,15 @@ pqPutInt(int value, size_t bytes, PGconn *conn)
            tmp2 = pg_hton16((uint16) value);
            if (pqPutMsgBytes((const char *) &tmp2, 2, conn))
                return EOF;
+           if (conn->Pfdebug)
+               pqStoreFrontendMsg(conn, LOG_INT16, 2);
            break;
        case 4:
            tmp4 = pg_hton32((uint32) value);
            if (pqPutMsgBytes((const char *) &tmp4, 4, conn))
                return EOF;
+           if (conn->Pfdebug)
+               pqStoreFrontendMsg(conn, LOG_INT32, 4);
            break

  It's better to make the style same as pqGetInt().
  (It is not important.)

   switch(bytes) {
     case 2:
       type = LOG_INT16;
       break;
     case 4:
       type = LOG_INT32;
       break;
    }
    pqStoreFrontendMsg(conn, type, bytes);


****
+pqTraceInit(PGconn *conn)
+{
+   conn->fe_msg = malloc(MAX_FRONTEND_MSGS * sizeof(pqFrontendMessage));
+   if (conn->fe_msg == NULL)
+   {
+       free(conn->be_msg);
+       return false;

  Maybe, we need to clear conn->be_msg by NULL for next calling pqTraceInit().

****
+pqTraceInit(PGconn *conn)
+{
+   conn->fe_msg = malloc(MAX_FRONTEND_MSGS * sizeof(pqFrontendMessage));
+   conn->n_fe_msgs = 0;

  Data structure design is odd.
  Frontend message is basically constructed as the following:
  - message type
  - message length
  - field
  - field
  - field

  So the deisign may be as the follwong and remove n_fe_msg from pg_conn.

  typedef struct pqFrontendMessageField
  {
      int         offset_in_buffer;    message_addr is not real address.
      int         length;              message_length is wrong name. it is length of FIELD.
  } pqFrontendMessageField;

  typedef struct pqFrontendMessage
  {
      PGLogMsgDataType type;
      int         field_num;
      pqFrontendMessageField fields[FLEXIBLE_ARRAY_MEMBER];
  } pqFrontendMessage;

  And then pqTraceInit() is as the following.
  conn->fe_msg = malloc(sizeof(pqFrontendMessage) +
             MAX_FRONTEND_MSG_FIELDS * sizeof(pqFrontendMessage));
  conn->fe_msg->field_num = 0;


****
+pqGetProtocolMsgType(unsigned char c, PGCommSource commsource)
+       return "UnknownCommand";

  It is "UnknownMessageType", isn't it?


****
+ * XXX -- ??
+ *     Message length is added at the last if message is sent by the frontend.
+ *     To arrange the log output format, frontend message contents are stored in the list.

  I understand as the following:
  * The message length is fixed after putting the last field,
  * but message length should be print before printing any field.
  * So we must store the field data in memory.


****
+pqStoreFrontendMsg(PGconn *conn, PGLogMsgDataType type, int length)
+           case LOG_STRING:
+               memcpy(&message, conn->outBuffer + conn->outMsgEnd - length, length);
+               fprintf(conn->Pfdebug, "To backend> \"%c\"\n", message);

  %s is correct.
  At least, %c is not correct.


****
+pqStoreFrontendMsg(PGconn *conn, PGLogMsgDataType type, int length)
+           case LOG_BYTE1:
+               memcpy(&message, conn->outBuffer + conn->outMsgEnd - length, length);

  It is unnecessary to call memcpy().
  LOG_BYTE1, LOG_STRING, and LOG_NCHAR can be passed its pointer directly to fprintf().
  You can also pass LOG_INT* data with casting to fprintf() without memcpy(), but I think either is fine.


****
+           case LOG_INT16:
+               fprintf(conn->Pfdebug, "To backend (#%d)> %c\n", length, result);
+           case LOG_INT32:
+               fprintf(conn->Pfdebug, "To backend (#%d)> %c\n", length, result);

  %d is correct.
  At least, %c is not correct.


****
+pqStoreFrontendMsg(PGconn *conn, PGLogMsgDataType type, int length)
+   char        message;
+   uint16      result16 = 0;
+   uint32      result32 = 0;
+   int         result = 0;

  It's better that these variables declared in each block.
  Initializing result* is unnecessary.

  case LOG_INT16:
  {
      uint16 wk;
      memcpy(&wk, conn->outBuffer + conn->outMsgEnd - 2, 2);
      wk = pg_ntoh16(wk);
      fprintf(conn->Pfdebug, "To backend (#%d)> %h\n", 2, wk);
      break;
  }


****
+pqLogFrontendMsg(PGconn *conn)
+   int         message_addr;
+   int         length;
+   char        message;
+   uint16      result16 = 0;
+   uint32      result32 = 0;
+   int         result = 0;

  Same as pqStoreFrontendMsg().

+               memcpy(&message, conn->outBuffer + message_addr, length);

  Same as pqStoreFrontendMsg().


****
+pqLogMsgByte1(PGconn *conn, char v, PGCommSource commsource)
+   char       *message_source = commsource == FROM_BACKEND ? "<" : ">";

  I understand that 'commsource' means source, but message_source means direction of sending.
  It is better that the variable is named message_direction.


****
+pqLogMsgByte1(PGconn *conn, char v, PGCommSource commsource)
+            case LOG_FIRST_BYTE:
+               if (v == '\0')
+                   return;

  Maybe is it needed for packets whose msg_type is '\0'?

  I get the following output because pqLogMsgnchar() is called
  at conn->be_msg->state == LENGTH. I attach a program for reproduction.

      2020-12-25 00:58:48 UTC > StartupMessage :::Invalid Protocol

  I think confusing transition of state causes it.
  I suggest refactoring instead of fixing the above bug.

  msg_type of Startup packet, SSLRequest packet, and GSSNegotiate packet
  is '\0'. (I guess GSSNegotiate packet may be forgotten).
  At these packet, Byte1 is not sent actually, but if libpq-trace framework consider
  that it is sent, the transition may become more clear like the following:

    pqLogMsgByte1(PGconn *conn, char v, PGCommSource commsource)
    :
      fprintf(conn->Pfdebug, "%s %s ", timestr, message_source);
      /*
       * True type of message tagged '\0' is known when next 4 bytes is checked.
       * So we delay printing message type to pqLogMsgInt().
       */
      if (v != '\0')
        fprintf(conn->Pfdebug, "%s ",
                pqGetProtocolMsgType((unsigned char) v, commsource));

      conn->be_msg->state = LOG_LENGTH;
      conn->be_msg->command = v;

    pqLogMsgInt(PGconn *conn, int v, int length, PGCommSource commsource)
    :

        /* remove case LOG_FIRST_BYTE and... */

        case LOG_LENGTH:
          if (conn->be_msg->command == '\0')
          {
             /* We delayed to print message type for special message. */
             message_addr = conn->fe_msg[0].message_addr;
             memcpy(&result32, conn->outBuffer + message_addr, 4);
             result = (int) pg_ntoh32(result32);

             if (result == NEGOTIATE_SSL_CODE)
                 message_type = "SSLRequest";
             else if (result == NEGOTIATE_GSS_CODE)
                 message_type = "GSSRequest";
             else
                 message_type = "StartupMessage";
             fprintf(conn->Pfdebug, "%s ", message_type);
           }

           fprintf(conn->Pfdebug, "%d", v);
           conn->be_msg->length = v - length;
           conn->be_msg->state = LOG_CONTENTS;
           pqTraceMaybeBreakLine(0, conn);
           break;

****
+pqLogMsgByte1(PGconn *conn, char v, PGCommSource commsource)
+           case LOG_CONTENTS:
+               /* Suppresses printing terminating zero byte */
+               if (v == '\0')
+                   fprintf(conn->Pfdebug, "0");

  Is the above code for end of 'E' or 'N' message?
  We should comment because it is special case.
  Additionally, we may confuse whether 0 is numerical character or '\0'.

  /*
   * 'E' or 'N' message format is as the following:
   *    'E' len [byte1-tag string(null-terminated)]* \0(eof-tag)
   *
   * So we detect eof-tag at pqLogMsgByte1() with LOG_CONTENTS state.
   * The eof-tag is data so we should print it.
   * On the way, we care other non-ascii character.
   */
  if (!isascii(v))
      fprintf(conn->Pfdebug, "\\x%02x", v);


****
+pqLogMsgByte1(PGconn *conn, char v, PGCommSource commsource)
+   if (PG_PROTOCOL_MAJOR(conn->pversion) >= 3)
+   else
+       fprintf(conn->Pfdebug, "FROM backend> %c\n", v);

  Umm. Also see the following.

+pqLogMsgInt(PGconn *conn, int v, int length, PGCommSource commsource)
+   if (PG_PROTOCOL_MAJOR(conn->pversion) >= 3)
+   else
+       fprintf(conn->Pfdebug, "To backend (#%d)> %d\n", length, v);

  In the later case, "FROM backend" is correct.

  The bug is caused by confusing design.
  I suggest to refactor that:
  - Move callings fprintf() in pqStoreFrontendMsg() to each pqLogMsgXXX()
  - pqStoreFrontendMsg() calls each pqLogMsgXXX().


****
+pqLogMsgString(PGconn *conn, const char *v, int length, PGCommSource commsource)
+{
+   if (length < 0)
+       length = strlen(v) + 1;

  I cannot find length==-1 route.


****
+pqLogMsgnchar(PGconn *conn, const char *v, int length, PGCommSource commsource)
+               fprintf(conn->Pfdebug, "\'");
+               fwrite(v, 1, length, conn->Pfdebug);
+               fprintf(conn->Pfdebug, "\'");

  We should consider that 'v' is binary data.
  So it is better to convert non-ascii character to ascii (e.g. x%02x format).

  e.g. case of StartupPacket
    2020-12-25 08:07:42 UTC > StartupMessage 40 'userk5userdatabasetemplate1'

  The length of StartupPacket includes length itselft.
  The length of 'user...tepmlate1' is only 27.
  We could not find 8 bytes(40 - 4 - 27 = 8).

  If non-ascii character is printed, we can find them.
    2020-12-25 08:29:19 UTC > StartupMessage 40 '\x00\x03\x00\x00user\x00k5user\x00database\x00template1\x00\x00'

    static void
    pqLogMsgBinary(PGconn *conn, const char *v, int length, PGCommSource commsource)
    {
      int i, pin;
      for (pin = i = 0; i < length; ++i)
      {
        if (isprint(v[i]))
          continue;
        else
        {
          fwrite(v + pin, 1, i - pin, conn->Pfdebug);
          fprintf(conn->Pfdebug, "\\x%02x", v[i]);
          pin = i + 1;
        }
      }
      if (pin < length)
        fwrite(v + pin, 1, length - pin, conn->Pfdebug);
    }


****
@@ -123,6 +123,9 @@ pqParseInput3(PGconn *conn)
+           /* Terminate a half-finished logging message */
+           if (conn->Pfdebug)
+               pqTraceMaybeBreakLine(msgLength, conn);
            return;

  I understand that the above code only want to do [conn->be_msg->length = 0].
  If it is true, you should create new wrapper function like pqTraceForcelyTerminateMessage()
  and handleSyncLoss() itself should call pqTraceForcelyTerminateMessage().

    pqTraceForcelyTerminateMessage()
    {
      if (conn->be_msg->length > 0)
        fprintf(conn->Pfdebug, "\n");

      pqTraceResetBeMsg(conn);
    }


****
pqSaveParameterStatus(PGconn *conn, const char *name, const char *value)
    if (conn->Pfdebug)
        fprintf(conn->Pfdebug, "pqSaveParameterStatus: '%s' = '%s'\n",
                name, value);

  I think the above code has become to be unnecessary because new pqTrace framework
  become to print it.



Regards
Ryo Matsumura

Attachment

RE: libpq debug log

From
"iwata.aya@fujitsu.com"
Date:
Hi,

Thank you for your review. I modified the code in response to those reviews.

This patch includes these items:
- Fix the code according to reviews
  - Fix COPY output issue
  - Change to not support Protocol v2.0
     It is rarely used anymore and de-support it makes code more simpler. Please see the discussion in this thread for
moredetails. 

Regards,
Aya Iwata

Attachment

RE: libpq debug log

From
"tsunakawa.takay@fujitsu.com"
Date:
From: iwata.aya@fujitsu.com <iwata.aya@fujitsu.com>
> This patch includes these items:

Is there anything else in this revision that is not handled?

Below are my comments.

Also, why don't you try running the regression tests with a temporary modification to PQtrace() to output the trace to
afile?  The sole purpose is to confirm that this patch makes the test crash (core dump). 


(1)
-    conn->Pfdebug = debug_port;
+    if (pqTraceInit(conn))
+    {
+        conn->Pfdebug = debug_port;
+        if (conn->Pfdebug != NULL)
+            setlinebuf(conn->Pfdebug);
+    }
+    else
+    {
+        /* XXX report ENOMEM? */
+        fprintf(conn->Pfdebug, "Failed to initialize trace support\n");
+        fflush(conn->Pfdebug);
+    }
 }

* When the passed debug_port is NULL, the function should return after calling PQuntrace().

* Is setlinebuf() available on Windows?  Maybe you should use setvbuf() instead.

* I don't see the need for separate pqTraceInit() function, because it is only called here.


(2)
+bool
+pqTraceInit(PGconn *conn)
+{
+    /* already done? */
+    if (conn->be_msg != NULL)
+    {

What's this code for?  I think it's sufficient that PQuntrace() frees b_msg and PQtrace() always allocates b_msg.


(3)
+    conn->fe_msg = malloc(sizeof(pqFrontendMessage) +
+                            MAX_FRONTEND_MSGS * sizeof(pqFrontendMessageField));
+    conn->fe_msg->field_num = 0;
+    if (conn->fe_msg == NULL)
+    {
+        free(conn->be_msg);
+        /* NULL out for the case that fe_msg malloc fails */
+        conn->be_msg = NULL;
+        return false;
+    }
+    conn->fe_msg->field_num = 0;

The memory for the fields array is one entry larger than necessary.  The allocation code would be:

+    conn->fe_msg = malloc(offsetof(pqFrontendMessage, fields) +
+                            MAX_FRONTEND_MSGS * sizeof(pqFrontendMessageField));

Also, the line with "field_num = 0" appears twice.  The first one should be removed.


(4)
+static void
+pqLogMessageByte1(PGconn *conn, char v, PGCommSource commsource)
+{
...
+    if (PG_PROTOCOL_MAJOR(conn->pversion) >= 3)
+    {

I think you can remove the protocol version check in various logging functions, because PQtrace() disables logging when
theprotocol version is 2. 


(5)
@@ -966,10 +966,6 @@ pqSaveParameterStatus(PGconn *conn, const char *name, const char *value)
     pgParameterStatus *pstatus;
     pgParameterStatus *prev;

-    if (conn->Pfdebug)
-        fprintf(conn->Pfdebug, "pqSaveParameterStatus: '%s' = '%s'\n",
-                name, value);
-

Where is this information output instead?


(6)
+ * Set up state so that we can trace. NB -- this might be called mutiple

mutiple -> multiple


(7)
+    LOG_FIRST_BYTE,                /* logging the first byte identifing the
+                                   protocol message type */

identifing -> identifying


(8)
+#define pqLogMsgByte1(CONN, CH, SOURCE) \
+((CONN)->Pfdebug ? pqLogMessageByte1(CONN, CH, SOURCE) : 0)

For functions that return void, follow the style of CHECK_FOR_INTERRUPTS() defined in miscadmin.h.


(9)
+                currtime = time(NULL);
+                tmp = localtime(&currtime);
+                strftime(timestr, sizeof(timestr), "%Y-%m-%d %H:%M:%S %Z", tmp);
+
+                fprintf(conn->Pfdebug, "%s %s ", timestr, message_direction);

It's better to have microsecond accuracy, because knowing the accumulation of such fine-grained timings may help to
troubleshootheavily-loaded client cases.  You can mimic setup_formatted_log_time() in src/backend/utils/error/elog.c.
Thisis used for the %m marker in log_line_prefix. 


(10)
I think it's better to use tabs (or any other character that is less likely to appear in the log field) as a field
separator,because it makes it easier to process the log with a spreadsheet or some other tools. 


(11)
+                /*
+                 * True type of message tagged '\0' is known when next 4 bytes is
+                 * checked. So we delay printing message type to pqLogMsgInt()
+                 */
+                if (v != '\0')
+                    fprintf(conn->Pfdebug, "%s ",
+                            pqGetProtocolMsgType((unsigned char) v, commsource));

In what cases does '\0' appear as a message type?


(12)
+static const char *
+pqGetProtocolMsgType(unsigned char c, PGCommSource commsource)
+{
+    if (commsource == FROM_BACKEND && c < lengthof(protocol_message_type_b))
+        return protocol_message_type_b[c];
+    else if (commsource == FROM_FRONTEND && c < lengthof(protocol_message_type_f))
+        return protocol_message_type_f[c];
+    else
+        return "UnknownMessage";
+}

This function returns NULL (=0) when protocol_message_type_b/f[c] is 0.  That crashes the caller:

+                    fprintf(conn->Pfdebug, "%s ",
+                            pqGetProtocolMsgType((unsigned char) v, commsource));

Plus, you may as well print the invalid message type.  Why don't you do something like this:

+static const char *
+pqGetProtocolMsgType(unsigned char c, PGCommSource commsource)
+{
+    static char unknown_message[64];
+    char *msg = NULL;

+    if (commsource == FROM_BACKEND && c < lengthof(protocol_message_type_b))
+        msg = protocol_message_type_b[c];
+    else if (commsource == FROM_FRONTEND && c < lengthof(protocol_message_type_f))
+        msg = protocol_message_type_f[c];
+
+    if (msg == NULL)
+    {
+        sprintf(unknown_message, "Unknown message %02x", c);
+        msg = unknown_message;
+    }
+
+    return msg;
+}


(13)
@@ -156,7 +156,12 @@ pqParseInput3(PGconn *conn)
         {
             /* If not IDLE state, just wait ... */
             if (conn->asyncStatus != PGASYNC_IDLE)
+            {
+                /* Terminate a half-finished logging message */
+                if (conn->Pfdebug)
+                    pqTraceForcelyBreakLine(msgLength, conn);
                 return;
+            }

             /*
              * Unexpected message in IDLE state; need to recover somehow.

What's this situation like?  Why output a new line and reset the trace status?


(14)
+/* pqLogInvalidProtocol: Print that the protocol message is invalid */
+static void
+pqLogInvalidProtocol(PGconn *conn)
+{
+    fprintf(conn->Pfdebug, ":::Invalid Protocol\n");
+    conn->be_msg->state = LOG_FIRST_BYTE;
+}

Is it sufficient to just reset the state field?  Isn't it necessary to call pqTraceResetBeMsg() instead?


(15)
@@ -212,15 +368,8 @@ pqSkipnchar(size_t len, PGconn *conn)
...
     conn->inCursor += len;
-
     return 0;
 }

This is an unnecessary removal of an empty line.


(16)
+static void
+pqLogMessageByte1(PGconn *conn, char v, PGCommSource commsource)
+{
+    char       *message_direction = commsource == FROM_BACKEND ? "<" : ">";
...
+        switch (conn->be_msg->state)

This function handles messages in both directions.  But the switch condition is based on the backend message state
(b_msg). How is this correct? 


(17)
What's the difference between pqLogMsgString() and pqLogMsgnchar()?  They both take a length argument.  Do they have to
beseparate functions? 


(18)
     if (conn->Pfdebug)
-        fprintf(conn->Pfdebug, "To backend> %c\n", c);
+        pqStoreFrontendMsg(conn, LOG_BYTE1, 1);

To match the style for backend messages with pqLogMsgByte1 etc., why don't you wrap the conn->Pfdebug check in the
macro?


(19)
@@ -520,8 +667,6 @@ pqPutMsgStart(char msg_type, bool force_len, PGconn *conn)
         /* allow room for message length */
         endPos += 4;
     }
-    else
-        lenPos = -1;

Why is this change necessary?


(20)
+static void
+pqLogFrontendMsg(PGconn *conn)
...
+    for (i = 0; i < conn->fe_msg->field_num; i++)
+    {
+        message_addr = conn->fe_msg->fields[i].offset_in_buffer;
+        length = conn->fe_msg->fields[i].length;
+
+        switch (conn->fe_msg[i].type)
+        {
+            case LOG_BYTE1:

When I saw this switch condition, the confusion exploded.  Shouldn't the type should be the attribute of each field as
follows?

+        switch (conn->fe_msg->fields[i].type)

I also found the function names for frontend -> backend messages hard to grasp.  Could you show the flow of function
callswhen sending messages to the backend? 


(21)
+                uint16        result16;
+                memcpy(&result16, conn->outBuffer + message_addr, length);

You should have an empty line between the declaration part and execution part.


(22)
I couldn't understand pqLogBinaryMsg() at all.  Could you explain what it does?  Particularly, if all bytes in the
stringis printable, the function seems to do nothing: 

+        if (isprint(v[i]))
+            continue;

Should the following part be placed outside the for loop?  What does 'pin' mean? (I want the variable name to be more
intuitive.)

+        if (pin < length)
+            fwrite(v + pin, 1, length - pin, conn->Pfdebug);


Regards
Takayuki Tsunakawa




RE: libpq debug log

From
"kuroda.hayato@fujitsu.com"
Date:
Dear Tsunakawa-san, Iwata-san,

> * Is setlinebuf() available on Windows?  Maybe you should use setvbuf() instead.

Yeah, cfbot2021 throws errors:
https://ci.appveyor.com/project/postgresql-cfbot/postgresql/build/1.0.124922

```
  src/interfaces/libpq/fe-connect.c(6776): warning C4013: 'setlinebuf' undefined; assuming extern returning int
[C:\projects\postgresql\libpq.vcxproj]
```

The manpage of setlinebuf() suggests how to replace it, so you should follow.

```
       The setbuffer() function is the same, except that the size of the buffer is up to the caller, rather  than
being determined  by  the 
       default BUFSIZ.  The setlinebuf() function is exactly equivalent to the call:

           setvbuf(stream, NULL, _IOLBF, 0);
```

Hayato Kuroda
FUJITSU LIMITED




RE: libpq debug log

From
"k.jamison@fujitsu.com"
Date:
Hi Iwata-san,

In addition to Tsunakawa-san's comments,
The compiler also complains:
  fe-misc.c:678:20: error: ‘lenPos’ may be used uninitialized in this function [-Werror=maybe-uninitialized]
    conn->outMsgStart = lenPos;

There's no need for variable lenPos anymore since we have decided *not* to support pre protocol 3.0.
And by that we have to update the description of pqPutMsgStart too.
So you can remove the lenPos variable and the condition where you have to check for protocol version.

diff --git a/src/interfaces/libpq/fe-misc.c b/src/interfaces/libpq/fe-misc.c
index 8bc9966..3de48be 100644
--- a/src/interfaces/libpq/fe-misc.c
+++ b/src/interfaces/libpq/fe-misc.c
@@ -644,14 +644,12 @@ pqCheckInBufferSpace(size_t bytes_needed, PGconn *conn)
  *
  * The state variable conn->outMsgStart points to the incomplete message's
  * length word: it is either outCount or outCount+1 depending on whether
- * there is a type byte.  If we are sending a message without length word
- * (pre protocol 3.0 only), then outMsgStart is -1.  The state variable
- * conn->outMsgEnd is the end of the data collected so far.
+ * there is a type byte.  The state variable conn->outMsgEnd is the end of
+ * the data collected so far.
  */
 int
 pqPutMsgStart(char msg_type, bool force_len, PGconn *conn)
 {
-       int                     lenPos;
        int                     endPos;

        /* allow room for message type byte */
@@ -661,9 +659,8 @@ pqPutMsgStart(char msg_type, bool force_len, PGconn *conn)
                endPos = conn->outCount;

        /* do we want a length word? */
-       if (force_len || PG_PROTOCOL_MAJOR(conn->pversion) >= 3)
+       if (force_len)
        {
-               lenPos = endPos;
                /* allow room for message length */
                endPos += 4;
        }
@@ -675,7 +672,7 @@ pqPutMsgStart(char msg_type, bool force_len, PGconn *conn)
        if (msg_type)
                conn->outBuffer[conn->outCount] = msg_type;
        /* set up the message pointers */
-       conn->outMsgStart = lenPos;
+       conn->outMsgStart = endPos;
        conn->outMsgEnd = endPos;



At the same time, the one below lacks one more zero. (Only has 8)
There should be 9 as Matsumura-san mentioned.
+    0, 0, 0, 0, 0, 0, 0, 0,     /* \x65 ... \0x6d */


The following can be removed in pqStoreFrontendMsg():
+ *        In protocol v2, we immediately print each message as we receive it.
+ *        (XXX why?)


Maybe the following description can be paraphrased:
+ *        The message length is fixed after putting the last field, but message
+ *        length should be print before printing any fields.So we must store the
+ *        field data in memory.
to:
+ *        The message length is fixed after putting the last field. But message
+ *        length should be printed before printing any field, so we must store
+ *        the field data in memory.


In pqStoreFrontendMsg, pqLogMessagenchar, pqLogMessageString,
pqLogMessageInt, pqLogMessageByte1, maybe it is unneccessary to use
+    if (PG_PROTOCOL_MAJOR(conn->pversion) >= 3)
because you have already indicated in the PQtrace() to return
silently when protocol 2.0 is detected.
+    /* Protocol 2.0 is not supported. */
+    if (PG_PROTOCOL_MAJOR(conn->pversion) < 3)
+        return;


In pqLogMessageInt(),
+    /* We delayed to print message type for special message. */
can be paraphrased to:
      /* We delay printing of the following special message_type */


Regards,
Kirk Jamison




Re: libpq debug log

From
Alvaro Herrera
Date:
On 2021-Jan-17, tsunakawa.takay@fujitsu.com wrote:

> * I don't see the need for separate pqTraceInit() function, because it is only called here.

That's true, but it'd require that we move PQtrace() to fe-misc.c,
because pqTraceInit() uses definitions that are private to that file.

> (2)
> +bool
> +pqTraceInit(PGconn *conn)
> +{
> +    /* already done? */
> +    if (conn->be_msg != NULL)
> +    {
> 
> What's this code for?  I think it's sufficient that PQuntrace() frees b_msg and PQtrace() always allocates b_msg.

The point is to be able to cope with a connection that calls PQtrace()
multiple times, with or without an intervening PQuntrace().
We'd make no friends if we made PQtrace() crash, or leak memory if
called a second time ...

> (5)
> @@ -966,10 +966,6 @@ pqSaveParameterStatus(PGconn *conn, const char *name, const char *value)
>      pgParameterStatus *pstatus;
>      pgParameterStatus *prev;
>  
> -    if (conn->Pfdebug)
> -        fprintf(conn->Pfdebug, "pqSaveParameterStatus: '%s' = '%s'\n",
> -                name, value);
> -
> 
> Where is this information output instead?

Hmm, seems to have been lost.  I restored it, but didn't verify
the resulting behavior carefully.

> (9)
> +                currtime = time(NULL);
> +                tmp = localtime(&currtime);
> +                strftime(timestr, sizeof(timestr), "%Y-%m-%d %H:%M:%S %Z", tmp);
> +
> +                fprintf(conn->Pfdebug, "%s %s ", timestr, message_direction);
> 
> It's better to have microsecond accuracy, because knowing the accumulation of such fine-grained timings may help to
troubleshootheavily-loaded client cases.  You can mimic setup_formatted_log_time() in src/backend/utils/error/elog.c.
Thisis used for the %m marker in log_line_prefix.
 

Yeah, it was me that removed printing of milliseconds -- Iwata-san's
original patch used ftime() which is not portable.  I had looked for
portable ways to get this but couldn't find anything that didn't involve
linking backend files, so I punted.  But looking again now, it is quite
simple: just use standard strftime and localtime and just concatenate
both together.  Similar to what setup_formatted_log_time except we don't
worry about the TZ at all.

> (10)
> I think it's better to use tabs (or any other character that is less likely to appear in the log field) as a field
separator,because it makes it easier to process the log with a spreadsheet or some other tools.
 

I can buy that.  I changed it to have some tabs; the lines are now:

timestamp "> or <" <tab> "message type" <tab> length <space> message contents

I think trying to apply tabs inside the message contents is going to be
more confusing than helpful.



> (12)
> [...]
> Plus, you may as well print the invalid message type.  Why don't you do something like this:
> 
> +static const char *
> +pqGetProtocolMsgType(unsigned char c, PGCommSource commsource)
> +{
> +    static char unknown_message[64];
> +    char *msg = NULL;
> 
> +    if (commsource == FROM_BACKEND && c < lengthof(protocol_message_type_b))
> +        msg = protocol_message_type_b[c];
> +    else if (commsource == FROM_FRONTEND && c < lengthof(protocol_message_type_f))
> +        msg = protocol_message_type_f[c];
> +
> +    if (msg == NULL)
> +    {
> +        sprintf(unknown_message, "Unknown message %02x", c);
> +        msg = unknown_message;
> +    }
> +
> +    return msg;
> +}

Right.  I had written something like this, but in the end decided not to
bother because it doesn't seem terribly important.  You can't do exactly
what you suggest, because it has the problem that you're returning a
stack-allocated variable even though your stack is about to be blown
away.  My copy had a static buffer that was malloc()ed on first use (and
if allocation fails, return a const string).  Anyway, I fixed the
crasher problem.

The protocol_message_type_b array had the serious problem it confused
the entry for byte 1 (0x01) with that of char '1' (and 2 and 3), so it
printed a lot of 'Unknown message' lines.  Clearly, the maintenance of
this array is going to be a pain point of this patch (counting number of
zeroes is no fun), and I think we're going to have some way to have an
explicit initializer, where we can do things like
    protocol_message_type_b['A'] = "NotificationResponse";
etc instead of the current way, which is messy, hard to maintain.
I'm not sure how to do that and not make things worse, however.

> (16)
> +static void
> +pqLogMessageByte1(PGconn *conn, char v, PGCommSource commsource)
> +{
> +    char       *message_direction = commsource == FROM_BACKEND ? "<" : ">";
> ...
> +        switch (conn->be_msg->state)
> 
> This function handles messages in both directions.  But the switch condition is based on the backend message state
(b_msg). How is this correct?
 

It's not.

I split things so that this function only prints backend messages; when
frontend messages are to be printed, we use a single fprintf() instead.
See about (20), below.

> 
> (17)
> What's the difference between pqLogMsgString() and pqLogMsgnchar()?  They both take a length argument.  Do they have
tobe separate functions?
 

nchar are not null-terminated and we escape !isprint() characters.  I'm
not sure that the difference is significant enough, but I'm also not
sure if they can be merged into one.

> (18)
>      if (conn->Pfdebug)
> -        fprintf(conn->Pfdebug, "To backend> %c\n", c);
> +        pqStoreFrontendMsg(conn, LOG_BYTE1, 1);
> 
> To match the style for backend messages with pqLogMsgByte1 etc., why don't you wrap the conn->Pfdebug check in the
macro?

I think these macros are useless and should be removed.  There's more
verbosity and confusion caused by them, than the clarity they provide.

> (20)
> +static void
> +pqLogFrontendMsg(PGconn *conn)
> ...
> +    for (i = 0; i < conn->fe_msg->field_num; i++)
> +    {
> +        message_addr = conn->fe_msg->fields[i].offset_in_buffer;
> +        length = conn->fe_msg->fields[i].length;
> +
> +        switch (conn->fe_msg[i].type)
> +        {
> +            case LOG_BYTE1:
> 
> When I saw this switch condition, the confusion exploded.  Shouldn't the type should be the attribute of each field
asfollows?
 
> 
> +        switch (conn->fe_msg->fields[i].type)
> 
> I also found the function names for frontend -> backend messages hard to grasp.  Could you show the flow of function
callswhen sending messages to the backend?
 

Exactly!  I was super confused about this too, and eventually decided to
rewrite the whole thing so that the 'type' is in the Fields struct.  And
while doing that, I noticed that it would make more sense to let the
fe_msg array be realloc'ed if it gets full, rather than making it fixed
size.  This made me add pqTraceUninit(), so that we can free the array
if it has grown, to avoid reserving arbitrarily large amounts of memory
after PQuntrace() on a session that traced large messages.

> (22)
> I couldn't understand pqLogBinaryMsg() at all.  Could you explain what it does?  Particularly, if all bytes in the
stringis printable, the function seems to do nothing:
 
> 
> +        if (isprint(v[i]))
> +            continue;
> 
> Should the following part be placed outside the for loop?  What does 'pin' mean? (I want the variable name to be more
intuitive.)
> 
> +        if (pin < length)
> +            fwrite(v + pin, 1, length - pin, conn->Pfdebug);

Yes :-(  I fixed that too.

This patch needs more work still, of course.

-- 
Álvaro Herrera       Valdivia, Chile
"Java is clearly an example of money oriented programming"  (A. Stepanov)

Attachment

RE: libpq debug log

From
"tsunakawa.takay@fujitsu.com"
Date:
Hello Alvaro-san, Iwata-san,


First of all, thank you Alvaro-san really a lot for your great help.  I'm glad you didn't lose interest and time for
thispatch yet.  (Iwata-san is my colleague.)
 


From: Alvaro Herrera <alvherre@alvh.no-ip.org>
> That's true, but it'd require that we move PQtrace() to fe-misc.c, because
> pqTraceInit() uses definitions that are private to that file.

Ah, that was the reason for separation.  Then, I'm fine with either keeping the separation, or integrating the two
functionsin fe-misc.c with PQuntrace() being also there.  I kind of think the latter is better, because of code
readabilityand, because tracing facility is not a connection-related reature so categorizing it as "misc" feels
natural.


> > What's this code for?  I think it's sufficient that PQuntrace() frees b_msg
> and PQtrace() always allocates b_msg.
> 
> The point is to be able to cope with a connection that calls PQtrace() multiple
> times, with or without an intervening PQuntrace().
> We'd make no friends if we made PQtrace() crash, or leak memory if called a
> second time ...

HEAD's PQtrace() always call PQuntrace() and then re-initialize from scratch.  So, if we keep that flow, the user can
callPQtrace() multiple in a row.
 


> I think trying to apply tabs inside the message contents is going to be more
> confusing than helpful.

Agreed.


> > Plus, you may as well print the invalid message type.  Why don't you do
> something like this:
> >
> > +static const char *
> > +pqGetProtocolMsgType(unsigned char c, PGCommSource commsource) {
> > +    static char unknown_message[64];
> > +    char *msg = NULL;
> >
> > +    if (commsource == FROM_BACKEND && c <
> lengthof(protocol_message_type_b))
> > +        msg = protocol_message_type_b[c];
> > +    else if (commsource == FROM_FRONTEND && c <
> lengthof(protocol_message_type_f))
> > +        msg = protocol_message_type_f[c];
> > +
> > +    if (msg == NULL)
> > +    {
> > +        sprintf(unknown_message, "Unknown message %02x", c);
> > +        msg = unknown_message;
> > +    }
> > +
> > +    return msg;
> > +}
> 
> Right.  I had written something like this, but in the end decided not to bother
> because it doesn't seem terribly important.  You can't do exactly what you
> suggest, because it has the problem that you're returning a stack-allocated
> variable even though your stack is about to be blown away.  My copy had a
> static buffer that was malloc()ed on first use (and if allocation fails, return a
> const string).  Anyway, I fixed the crasher problem.

My suggestion included static qualifier to not use the stack, but it doesn't work anyway in multi-threaded
applications. So I agree that we don't print the invalid message type value.
 


> > (18)
> >      if (conn->Pfdebug)
> > -        fprintf(conn->Pfdebug, "To backend> %c\n", c);
> > +        pqStoreFrontendMsg(conn, LOG_BYTE1, 1);
> >
> > To match the style for backend messages with pqLogMsgByte1 etc., why
> don't you wrap the conn->Pfdebug check in the macro?
> 
> I think these macros are useless and should be removed.  There's more
> verbosity and confusion caused by them, than the clarity they provide.

Agreed.


> This patch needs more work still, of course.

Yes, she is updating the patch based on the feedback from you, Kirk-san, and me.

By the way, I just looked at the beginning of v12.

-void PQtrace(PGconn *conn, FILE *stream);
+void PQtrace(PGconn *conn, FILE *stream, bits32 flags);

Can we change the signature of an API function?



Iwata-san,
Please integrate Alvaro-san's patch with yours.  Next week is the last week in this CF, so do your best to post the
patchby next Monday or so (before Alvaro-san loses interest or time.)  Then I'll review it ASAP.
 


Regards
Takayuki Tsunakawa


Re: libpq debug log

From
'Alvaro Herrera'
Date:
Hello, just two quick comments on this,

On 2021-Jan-22, tsunakawa.takay@fujitsu.com wrote:

> From: Alvaro Herrera <alvherre@alvh.no-ip.org>
> > That's true, but it'd require that we move PQtrace() to fe-misc.c, because
> > pqTraceInit() uses definitions that are private to that file.
> 
> Ah, that was the reason for separation.  Then, I'm fine with either
> keeping the separation, or integrating the two functions in fe-misc.c
> with PQuntrace() being also there.  I kind of think the latter is
> better, because of code readability and, because tracing facility is
> not a connection-related reature so categorizing it as "misc" feels
> natural.

Maybe we can create a new file specifically for this to avoid mixing
unrelated stuff in fe-misc.c -- say fe-trace.c where all this new
tracing stuff goes.

> > The point is to be able to cope with a connection that calls PQtrace() multiple
> > times, with or without an intervening PQuntrace().
> > We'd make no friends if we made PQtrace() crash, or leak memory if called a
> > second time ...
> 
> HEAD's PQtrace() always call PQuntrace() and then re-initialize from
> scratch.  So, if we keep that flow, the user can call PQtrace()
> multiple in a row.

Oh, of course.

-- 
Álvaro Herrera       Valdivia, Chile
"No necesitamos banderas
 No reconocemos fronteras"                  (Jorge González)



RE: libpq debug log

From
"tsunakawa.takay@fujitsu.com"
Date:
First, some possibly major questions:


(23)
From: 'Alvaro Herrera' <alvherre@alvh.no-ip.org>
> Maybe we can create a new file specifically for this to avoid mixing
> unrelated stuff in fe-misc.c -- say fe-trace.c where all this new
> tracing stuff goes.

What do you think about this suggestion?  I think this is reasonable, including moving PQtrace/PQuntrace to the new
file.

OTOH, fe-misc also has humble reasons to contain them.  One is that the file header comment is as follows, accepting
miscellaneousstuff.  Another is that although most functions in fe-misc.c are related to protocol data transmission and
receipt,several functions at the end of the file are already not related.
 

 *      fe-misc.c
 *
 *   DESCRIPTION
 *       miscellaneous useful functions


(24)
-void PQtrace(PGconn *conn, FILE *stream);
+void PQtrace(PGconn *conn, FILE *stream, bits32 flags);
 </synopsis>
      </para>
 
+     <para>
+      <literal>flags</literal> contains flag bits describing the operating mode
+      of tracing.  If <literal>(flags & TRACE_SUPPRESS_TIMESTAMPS)</literal> is
+      true, then timestamps are not printed with each message.
+     </para>

As I asked in the previous mail, I'm afraid we cannot change the signature of existing API functions.  If we want this
flagbits, we have to add something like PQtraceEx(), don't we?
 

The flag name differs between in the manual and in the source code:

+#define PQTRACE_SUPPRESS_TIMESTAMPS        (1 << 0)


P.S.
Also, please note this as:

> Also, why don't you try running the regression tests with a temporary modification to PQtrace() to output the trace
toa file?  The sole purpose is to confirm that this patch doesn't make the test crash (core dump).
 



Regards
Takayuki Tsunakawa


RE: libpq debug log

From
"iwata.aya@fujitsu.com"
Date:
Hello,  Alvaro san , Tsunakawa san

Thank you for your help. It was very helpful.

> Please integrate Alvaro-san's patch with yours.  Next week is the last week in this CF, so do your best to post the
patchby next Monday or so (before Alvaro-san loses interest or time.)  Then I'll review it ASAP.
 
I integrated my fix and update v12 patch.  All previous review comments apply to this patch.

Regards,
Aya Iwata
Fujitsu

Attachment

RE: libpq debug log

From
"tsunakawa.takay@fujitsu.com"
Date:
Iwata-san, all,


Strangely, Iwata-san's latest mail she sent today at 10:34 JST hasn't appeared on pgsql-hackers yet after more than 6
hours. It is not reflected in the CF entry [1].  So, I'm putting her original mail below.  The v13 patch attached to
theoriginal mail is attached to this mail.
 

I guess only Alvaro-san and me, who are listed in To:, received the original mail.

Iwata-san,
Please ask pgsql-www list about this problem.


[1]
https://commitfest.postgresql.org/31/2889/


Regards
Takayuki Tsunakawa


> -----Original Message-----
> From: Iwata, Aya/岩田 彩 <iwata.aya@fujitsu.com>
> Sent: Monday, January 25, 2021 10:34 AM
> To: 'Alvaro Herrera' <alvherre@alvh.no-ip.org>; Tsunakawa, Takayuki/綱川 貴
> 之 <tsunakawa.takay@fujitsu.com>
> Cc: 'pgsql-hackers@lists.postgresql.org'
> <pgsql-hackers@lists.postgresql.org>
> Subject: RE: libpq debug log
> 
> Hello,  Alvaro san , Tsunakawa san
> 
> Thank you for your help. It was very helpful.
> 
> > Please integrate Alvaro-san's patch with yours.  Next week is the last week
> in this CF, so do your best to post the patch by next Monday or so (before
> Alvaro-san loses interest or time.)  Then I'll review it ASAP.
> I integrated my fix and update v12 patch.  All previous review comments apply
> to this patch.
> 
> Regards,
> Aya Iwata
> Fujitsu

Attachment

RE: libpq debug log

From
"k.jamison@fujitsu.com"
Date:
Hello,

I have not tested nor review the latest patch changes yet, but I am reporting the compiler errors.
I am trying to compile the patch since V12 (Alvaro's version), but the following needs to
be fixed too because of the complaints: doc changes and undeclared INT_MAX

libpq.sgml:5893: parser error : xmlParseEntityRef: no name
      of tracing.  If <literal>(flags & TRACE_SUPPRESS_TIMESTAMPS)</literal> is
                                       ^
libpq.sgml:8799: parser error : chunk is not well balanced
postgres.sgml:195: parser error : Failure to process entity libpq
  &libpq;
         ^
postgres.sgml:195: parser error : Entity 'libpq' not defined
  &libpq;
         ^
...
fe-misc.c: In function ‘pqStoreFrontendMsg’:
fe-misc.c:903:35: error: ‘INT_MAX’ undeclared (first use in this function)
    if (conn->fe_msg->max_fields > INT_MAX / 2)


Regards,
Kirk Jamison

RE: libpq debug log

From
"iwata.aya@fujitsu.com"
Date:
Tsunakawa san, 

> Strangely, Iwata-san's latest mail she sent today at 10:34 JST hasn't appeared
> on pgsql-hackers yet after more than 6 hours.  It is not reflected in the CF
> entry [1].  So, I'm putting her original mail below.  The v13 patch attached to
> the original mail is attached to this mail.

I solved this problem. Thank you for sharing my email.

Regards,
Aya Iwata
Fujitsu

RE: libpq debug log

From
"tsunakawa.takay@fujitsu.com"
Date:
Iwata-san,


(25)
+        conn->fe_msg = malloc(sizeof(offsetof(pqFrontendMessage, fields)) +
+                              DEF_FE_MSGFIELDS * sizeof(pqFrontendMessageField));

It's incorrect that offsetof() is nested in sizeof().  See my original review comment.


(26)
+bool
+pqTraceInit(PGconn *conn, bits32 flags)
+{
...
+        conn->be_msg->state = LOG_FIRST_BYTE;
+        conn->be_msg->length = 0;
+    }
...
+    conn->be_msg->state = LOG_FIRST_BYTE;
+    conn->be_msg->length = 0;

The former is redundant?


(27)
+/*
+ * Deallocate frontend-message tracking memory.  We only do this because
+ * it can grow arbitrarily large, and skip it in the initial state, because
+ * it's likely pointless.
+ */
+void
+pqTraceUninit(PGconn *conn)
+{
+    if (conn->fe_msg &&
+        conn->fe_msg->num_fields != DEF_FE_MSGFIELDS)
+    {
+        pfree(conn->fe_msg);
+        conn->fe_msg = NULL;
+    }
+}

I thought this function plays the reverse role of pqTraceInit(), but it only frees memory for frontend messages.  Plus,
usefree() instead of pfree(), because malloc() is used to allocate memory.
 


(28)
+void PQtrace(PGconn *conn, FILE *stream, bits32 flags);

bits32 can't be used because it's a data type defined in c.h, which should not be exposed to applications.  I think you
canjust int or something.
 


Considering these and the compilation error Kirk-san found, I'd like you to do more self-review before I resume this
review.


Regards
Takayuki Tsunakawa



Re: libpq debug log

From
'Alvaro Herrera'
Date:
On 2021-Jan-25, tsunakawa.takay@fujitsu.com wrote:

> Iwata-san,

[...]

> Considering these and the compilation error Kirk-san found, I'd like
> you to do more self-review before I resume this review.

Kindly note that these errors are all mine.

Thanks for the review

-- 
Álvaro Herrera                            39°49'30"S 73°17'W
#error "Operator lives in the wrong universe"
  ("Use of cookies in real-time system development", M. Gleixner, M. Mc Guire)



RE: libpq debug log

From
"k.jamison@fujitsu.com"
Date:
On Mon, Jan 25, 2021 10:11 PM (JST), Alvaro Herrera wrote:
> On 2021-Jan-25, tsunakawa.takay@fujitsu.com wrote:
> 
> > Iwata-san,
> 
> [...]
> 
> > Considering these and the compilation error Kirk-san found, I'd like
> > you to do more self-review before I resume this review.
> 
> Kindly note that these errors are all mine.
> 
> Thanks for the review
> 
Hello,
To make the CFBot happy, I fixed the compiler errors.
I have not included here the change proposal to move the tracing functions
to a new file: fe-trace.c or something, so I retained the changes .
Maybe Iwata-san can consider the proposal.
Currently, both pqTraceInit() and pqTraceUninit() are called only by one function.

Summary:
1. I changed the bits32 flags to int flags
2. I assumed INT_MAX value is the former MAX_FRONTEND_MSGS
   I defined it to solve the compile error. Please tell if the value was wrong.
3. Fixed the doc errors
4. Added freeing of be_msg in pqTraceUninit()
5. Addressed the following comments: 

> (25)
> +        conn->fe_msg =
> malloc(sizeof(offsetof(pqFrontendMessage, fields)) +
> +
> DEF_FE_MSGFIELDS * sizeof(pqFrontendMessageField));
> 
> It's incorrect that offsetof() is nested in sizeof().  See my original 
> review comment.

I removed the initial sizeof().


> (26)
> +bool
> +pqTraceInit(PGconn *conn, bits32 flags) {
> ...
> +        conn->be_msg->state = LOG_FIRST_BYTE;
> +        conn->be_msg->length = 0;
> +    }
> ...
> +    conn->be_msg->state = LOG_FIRST_BYTE;
> +    conn->be_msg->length = 0;
> 
> The former is redundant?

Right. I've removed the former.


> (27)
> +/*
> + * Deallocate frontend-message tracking memory.  We only do this 
> +because
> + * it can grow arbitrarily large, and skip it in the initial state, 
> +because
> + * it's likely pointless.
> + */
> +void
> +pqTraceUninit(PGconn *conn)
> +{
> +    if (conn->fe_msg &&
> +        conn->fe_msg->num_fields != DEF_FE_MSGFIELDS)
> +    {
> +        pfree(conn->fe_msg);
> +        conn->fe_msg = NULL;
> +    }
> +}
> 
> I thought this function plays the reverse role of pqTraceInit(), but 
> it only frees memory for frontend messages.  Plus, use free() instead 
> of pfree(), because
> malloc() is used to allocate memory.

Fixed to use free(). Also added the freeing of be_msg.

> (28)
> +void PQtrace(PGconn *conn, FILE *stream, bits32 flags);
> 
> bits32 can't be used because it's a data type defined in c.h, which 
> should not be exposed to applications.  I think you can just int or something.

I used int. It still works to suppress the timestamp when flags is true.

In my environment, when flags is false(0):
2021-01-28 06:26:11.368 >   Query   27 "COPY country TO STDOUT"
2021-01-28 06:26:11.368 >   Query   -1
2021-01-28 06:26:11.369 <   CopyOutResponse 13 \x00 #3 #0 #0 #0
2021-01-28 06:26:11.369 <   CopyDone    4

2021-01-28 06:26:11.369 <   CopyDone    4
2021-01-28 06:26:11.369 <   CopyDone    4
2021-01-28 06:26:11.369 <   CommandComplete 11 "COPY 0"
2021-01-28 06:26:11.369 <   ReadyForQuery   5
2021-01-28 06:26:11.369 >   Terminate   4
2021-01-28 06:26:11.369 >   Terminate   -1

When flags is true(1), running the same query:
>   Query   27 "COPY country TO STDOUT"
>   Query   -1
<   CopyOutResponse 13 \x00 #3 #0 #0 #0
<   CopyDone    4

<   CopyDone    4
<   CopyDone    4
<   CommandComplete 11 "COPY 0"
<   ReadyForQuery   5
>   Terminate   4
>   Terminate   -1


Regards,
Kirk Jamison

Attachment

RE: libpq debug log

From
"k.jamison@fujitsu.com"
Date:
On Mon, January 25, 2021 4:13 PM (JST), Tsunakawa-san wrote:.
> Also, please note this as:
> 
> > Also, why don't you try running the regression tests with a temporary
> modification to PQtrace() to output the trace to a file?  The sole purpose is
> to confirm that this patch doesn't make the test crash (core dump).

I realized that existing libpq regression tests in src/test/examples do not
test PQtrace(). At the same time, no other functions Is calling PQtrace(),
so it's expected that by default if there are no compilation errors, then it
will pass all the tests. And it did. 

So to check that the tracing feature is enabled and, as requested, outputs
a trace file, I temporarily modified a bit of testlibpq.c instead **based from
my current environment settings**, and ran the regression test.

diff --git a/src/test/examples/testlibpq.c b/src/test/examples/testlibpq.c
index 0372781..ae163e4 100644
--- a/src/test/examples/testlibpq.c
+++ b/src/test/examples/testlibpq.c
@@ -26,6 +26,7 @@ main(int argc, char **argv)
        int                     nFields;
        int                     i,
                                j;
+       FILE            *trace_file;

        /*
         * If the user supplies a parameter on the command line, use it as the
@@ -40,6 +41,9 @@ main(int argc, char **argv)
        /* Make a connection to the database */
        conn = PQconnectdb(conninfo);

+       trace_file = fopen("/home/postgres/tracelog/trace.out","w");
+       PQtrace(conn, trace_file, 0);
+
        /* Check to see that the backend connection was successfully made */
        if (PQstatus(conn) != CONNECTION_OK)
        {
@@ -127,5 +131,6 @@ main(int argc, char **argv)
        /* close the connection to the database and cleanup */
        PQfinish(conn);

+       fclose(trace_file);
        return 0;
 }


make -C src/test/examples/ PROGS=testlibpq
./src/test/examples/testlibpq


I've verified that it works (no crash) and outputs a trace file to the PATH that I set in libpqtest.

The trace file contains the following log for the testlibq:

2021-01-28 09:22:28.155 >   Query   59 "SELECT pg_catalog.set_config('search_path', '', false)"
2021-01-28 09:22:28.156 >   Query   -1
2021-01-28 09:22:28.157 <   RowDescription  35 #1 "set_config" 0 #0 25 #65535 -1 #0
2021-01-28 09:22:28.157 <   DataRow 10 #1 0
2021-01-28 09:22:28.157 <   CommandComplete 13 "SELECT 1"
2021-01-28 09:22:28.157 <   ReadyForQuery   5
2021-01-28 09:22:28.157 <   ReadyForQuery   5 I
2021-01-28 09:22:28.157 >   Query   10 "BEGIN"
2021-01-28 09:22:28.157 >   Query   -1
2021-01-28 09:22:28.157 <   CommandComplete 10 "BEGIN"
2021-01-28 09:22:28.157 <   ReadyForQuery   5
2021-01-28 09:22:28.157 <   ReadyForQuery   5 T
2021-01-28 09:22:28.157 >   Query   58 "DECLARE myportal CURSOR FOR select * from pg_database"
2021-01-28 09:22:28.157 >   Query   -1
2021-01-28 09:22:28.159 <   CommandComplete 19 "DECLARE CURSOR"
2021-01-28 09:22:28.159 <   ReadyForQuery   5
2021-01-28 09:22:28.159 <   ReadyForQuery   5 T
2021-01-28 09:22:28.159 >   Query   26 "FETCH ALL in myportal"
2021-01-28 09:22:28.159 >   Query   -1
2021-01-28 09:22:28.159 <   RowDescription  405 #14 "oid" 1262 #1 26 #4 -1 #0 "datname" 1262 #2 19 #64 -1 #0 "datdba"
1262#3 26 #4 -1 #0 "encoding" 1262 #4 23 #4 -1 #0 "datcollate" 1262 #5 19 #64 -1 #0 "datctype" 1262 #6 19 #64 -1 #0
"datistemplate"1262 #7 16 #1 -1 #0 "datallowconn" 1262 #8 16 #1 -1 #0 "datconnlimit" 1262 #9 23 #4 -1 #0
"datlastsysoid"1262 #10 26 #4 -1 #0 "datfrozenxid" 1262 #11 28 #4 -1 #0 "datminmxid" 1262 #12 28 #4 -1 #0
"dattablespace"1262 #13 26 #4 -1 #0 "datacl" 1262 #14 1034 #65535 -1 #0
 
2021-01-28 09:22:28.159 <   DataRow 117 #14 5 '13756' 8 'postgres' 2 '10' 1 '6' 11 'en_US.UTF-8' 11 'en_US.UTF-8' 1 'f'
1't' 2 '-1' 5 '13755' 3 '502' 1 '1' 4 '1663' -1
 
2021-01-28 09:22:28.159 <   DataRow 149 #14 1 '1' 9 'template1' 2 '10' 1 '6' 11 'en_US.UTF-8' 11 'en_US.UTF-8' 1 't' 1
't'2 '-1' 5 '13755' 3 '502' 1 '1' 4 '1663' 35 '{=c/postgres,postgres=CTc/postgres}'
 
2021-01-28 09:22:28.159 <   DataRow 153 #14 5 '13755' 9 'template0' 2 '10' 1 '6' 11 'en_US.UTF-8' 11 'en_US.UTF-8' 1
't'1 'f' 2 '-1' 5 '13755' 3 '502' 1 '1' 4 '1663' 35 '{=c/postgres,postgres=CTc/postgres}'
 
2021-01-28 09:22:28.159 <   CommandComplete 12 "FETCH 3"
2021-01-28 09:22:28.159 <   ReadyForQuery   5
2021-01-28 09:22:28.159 <   ReadyForQuery   5 T
2021-01-28 09:22:28.159 >   Query   19 "CLOSE myportal"
2021-01-28 09:22:28.159 >   Query   -1
2021-01-28 09:22:28.159 <   CommandComplete 17 "CLOSE CURSOR"
2021-01-28 09:22:28.159 <   ReadyForQuery   5
2021-01-28 09:22:28.159 <   ReadyForQuery   5 T
2021-01-28 09:22:28.159 >   Query   8 "END"
2021-01-28 09:22:28.159 >   Query   -1
2021-01-28 09:22:28.160 <   CommandComplete 11 "COMMIT"
2021-01-28 09:22:28.160 <   ReadyForQuery   5
2021-01-28 09:22:28.160 <   ReadyForQuery   5 I
2021-01-28 09:22:28.160 >   Terminate   4
2021-01-28 09:22:28.160 >   Terminate   -1


Regards,
Kirk Jamison

Re: libpq debug log

From
'Alvaro Herrera'
Date:
On 2021-Jan-28, k.jamison@fujitsu.com wrote:

> I realized that existing libpq regression tests in src/test/examples do not
> test PQtrace(). At the same time, no other functions Is calling PQtrace(),
> so it's expected that by default if there are no compilation errors, then it
> will pass all the tests. And it did. 
> 
> So to check that the tracing feature is enabled and, as requested, outputs
> a trace file, I temporarily modified a bit of testlibpq.c instead **based from
> my current environment settings**, and ran the regression test.

Yeah.  It seems useful to build a real test harness based on the new
tracing functionality.  And that is precisely why I added the option to
suppress timestamps: so that we can write trace files that do not vary
from run to run.  That allows us to have an "expected" trace to which we
can compare the trace file produced by the actual run.  I had the idea
that instead of a timestamp we would print a message counter.  I didn't
implement that, however.

So what we need to do now, before we cast in stone such expected files,
is ensure that the trace file produced by some program (be it
testlibpq.c or some other program) accurately matches what is sent over
the network.  If the tracing infrastructure has bugs, then the trace
will contain artifacts, and that's something to avoid.  For example, in
the trace you paste, why are there two "Query" messages every time, with
the second one having length -1?  I think this is a bug I introduced
recently.

> 2021-01-28 09:22:28.155 >   Query   59 "SELECT pg_catalog.set_config('search_path', '', false)"
> 2021-01-28 09:22:28.156 >   Query   -1
> 2021-01-28 09:22:28.157 <   RowDescription  35 #1 "set_config" 0 #0 25 #65535 -1 #0
> 2021-01-28 09:22:28.157 <   DataRow 10 #1 0
> 2021-01-28 09:22:28.157 <   CommandComplete 13 "SELECT 1"

And afterwards, why are always there two ReadyForQuery messages?

> 2021-01-28 09:22:28.157 <   ReadyForQuery   5
> 2021-01-28 09:22:28.157 <   ReadyForQuery   5 I

In ReadyForQuery, we also get a transaction status.  In this case it's
"I" which means "Idle" (pretty mysterious if you don't know, as was my
case).  A bunch of other values are possible.  Do we want to translate
this to human-readable flags, similarly to how we translate message tag
chars to message names?  Seems useful to me.  Or maybe it's
overengineering, about which see below.

> 2021-01-28 09:22:28.159 <   RowDescription  405 #14 "oid" 1262 #1 26 #4 -1 #0 "datname" 1262 #2 19 #64 -1 #0 "datdba"
1262#3 26 #4 -1 #0 "encoding" 1262 #4 23 #4 -1 #0 "datcollate" 1262 #5 19 #64 -1 #0 "datctype" 1262 #6 19 #64 -1 #0
"datistemplate"1262 #7 16 #1 -1 #0 "datallowconn" 1262 #8 16 #1 -1 #0 "datconnlimit" 1262 #9 23 #4 -1 #0
"datlastsysoid"1262 #10 26 #4 -1 #0 "datfrozenxid" 1262 #11 28 #4 -1 #0 "datminmxid" 1262 #12 28 #4 -1 #0
"dattablespace"1262 #13 26 #4 -1 #0 "datacl" 1262 #14 1034 #65535 -1 #0
 

This is terribly unreadable, but at this point that's okay because we're
just doing tracing at a very low level.  In the future we could add some
new flag PQTRACE_INTERPRET_MESSAGES or something, which changes the
output format to something more readable.  Or maybe nobody cares to
spend time doing that.

Cheers

-- 
Álvaro Herrera       Valdivia, Chile
"Para tener más hay que desear menos"



RE: libpq debug log

From
"tsunakawa.takay@fujitsu.com"
Date:
From: Jamison, Kirk/ジャミソン カーク <k.jamison@fujitsu.com>
> To make the CFBot happy, I fixed the compiler errors.
> I have not included here the change proposal to move the tracing functions to a
> new file: fe-trace.c or something, so I retained the changes .
> Maybe Iwata-san can consider the proposal.
> Currently, both pqTraceInit() and pqTraceUninit() are called only by one
> function.

I confirmed all mentioned points have been fixed.  Additional comments are:


(29)
-void PQtrace(PGconn *conn, FILE *stream);
+void PQtrace(PGconn *conn, FILE *stream, int flags);

As I said before, I'm afraid we cannot change the signature of existing API functions.  Please leave the signature of
thisfunction unchanged, and add a new function like PQtraceEx() that adds the flags argument.
 

I guess the purpose of adding the flag argument is to not output the timestamp by default, because getting timestamps
wouldincur significant overhead (however, I'm not sure if that overhead is worth worrying about given the already
incurredlogging overhead.)  So, I think it's better to have a flag like PQTRACE_OUTPUT_TIMESTAMPS instead of
PQTRACE_SUPPRESS_TIMESTAMPS,and the functions would look like:
 

void
PQtrace(PGconn *conn, FILE *stream)
{
    PQtraceEx(conn, stream, 0);
}

void
PQtraceEx(PGconn *conn, FILE *stream, int flags)
{
    ... the new implementation in the patch
}

Remember to add PQtraceEx in exports.txt and make sure it builds on Windows with Visual Studio.

But... can you evaluate the overhead for getting timestamps and see if we can turn it on by default, or further, if we
needsuch a flag and PQtraceEx()?  For example, how about comparing the times to run the regression test with and
withouttimestamps?
 


(30)
+/*
+ * Deallocate FE/BE message tracking memory.  We only do this because
+ * FE message can grow arbitrarily large, and skip it in the initial state,
+ * because it's likely pointless.
+ */
+void
+pqTraceUninit(PGconn *conn)
+{
+    if (conn->fe_msg &&
+        conn->fe_msg->num_fields != DEF_FE_MSGFIELDS)
+    {
+        free(conn->fe_msg);
+        conn->fe_msg = NULL;
+    }

What does the second if condition mean?  If it's not necessary, change the comment accordingly.


(31)
@@ -1011,11 +1615,16 @@ pqSendSome(PGconn *conn, int len)
 int
 pqFlush(PGconn *conn)
 {
-    if (conn->Pfdebug)
-        fflush(conn->Pfdebug);
-
     if (conn->outCount > 0)
+    {
+        /* XXX comment */
+        if (conn->Pfdebug)
+        {
+            pqLogFrontendMsg(conn, -1);
+            fflush(conn->Pfdebug);
+        }
         return pqSendSome(conn, conn->outCount);
+    }

Remove the comment line.


(32)
+#define INT_MAX    (2048/2)        /* maximum array size */

INT_MAX is available via:
#include <limits.h>


(33)
     /* realloc failed. Probably out of memory */
-    appendPQExpBufferStr(&conn->errorMessage,
-                         "cannot allocate memory for input buffer\n");
+    appendPQExpBuffer(&conn->errorMessage,
+                      "cannot allocate memory for input buffer\n");

This change appears to be irrelevant.


(34)
+static void
+pqStoreFeMsgStart(PGconn *conn, char type)
+{
+    if (PG_PROTOCOL_MAJOR(conn->pversion) >= 3)
+        conn->fe_msg->msg_type = type;
+}

Protocol version check is unnecessary because this tracing takes effect only in v3 protocol?


(35)
+        conn->fe_msg =
+            realloc(conn->fe_msg,
+                    sizeof(pqFrontendMessage) +
+                    2 * conn->fe_msg->max_fields * sizeof(pqFrontendMessageField));

Align this size calculation with that in pqTraceInit().


(36)
+    /* append milliseconds */
+    sprintf(timestr + strlen(timestr), ".%03d", (int) (tval.tv_usec / 1000));

Output microsecond instead.  As your example output shows, many lines show the exactly same timestamps and thus they
arenot distinguishable in time.
 


(37)
+static char *
+pqLogFormatTimestamp(char *timestr)
+{
...
+#define FORMATTED_TS_LEN 128
+    strftime(timestr, FORMATTED_TS_LEN,

Add an argument to this function that indicates the size of timestr.  Define FORMATTED_TS_LEN globally in this source
file,and have callers use it.  That is, the code like:
 

+        char        timestr[128];
+
+        fprintf(conn->Pfdebug, "%s\t>\t%s\t%d",
+                pqLogFormatTimestamp(timestr),

becomes:

+        char        timestr[FORMATTED_TS_LEN];
+
+        fprintf(conn->Pfdebug, "%s\t>\t%s\t%d",
+                pqLogFormatTimestamp(timestr, sizeof(timestr)),


(38)
+    if ((conn->traceFlags & PQTRACE_SUPPRESS_TIMESTAMPS) == 0)
+    {
+        char        timestr[128];
+
+        fprintf(conn->Pfdebug, "%s\t>\t%s\t%d",
+                pqLogFormatTimestamp(timestr),
+                pqGetProtocolMsgType(conn->fe_msg->msg_type,
+                                     MSGDIR_FROM_FRONTEND),
+                msgLen);
+    }
+    else
+        fprintf(conn->Pfdebug, ">\t%s\t%d",
+                pqGetProtocolMsgType(conn->fe_msg->msg_type,
+                                     MSGDIR_FROM_FRONTEND),
+                msgLen);

To reduce the code for easier modification, change the above code to something like:

+    char        timestr[128];
+    char        *timestr_p = "";
+
+    if ((conn->traceFlags & PQTRACE_SUPPRESS_TIMESTAMPS) == 0)
+        timestr_p = pqLogFormatTimestamp(timestr);
+    fprintf(conn->Pfdebug, "%s\t>\t%s\t%d",
+            timestr_p,
+            pqGetProtocolMsgType(conn->fe_msg->msg_type,
+                                 MSGDIR_FROM_FRONTEND),
+            msgLen);


Regards
Takayuki Tsunakawa




RE: libpq debug log

From
"tsunakawa.takay@fujitsu.com"
Date:
From: Jamison, Kirk/ジャミソン カーク <k.jamison@fujitsu.com>
> I realized that existing libpq regression tests in src/test/examples do not
> test PQtrace(). At the same time, no other functions Is calling PQtrace(),
> so it's expected that by default if there are no compilation errors, then it
> will pass all the tests. And it did.
> 
> So to check that the tracing feature is enabled and, as requested, outputs
> a trace file, I temporarily modified a bit of testlibpq.c instead **based from
> my current environment settings**, and ran the regression test.

To run the tracing code in much more extensive cases, can you try running the whole SQL regression test with the
tracingenabled?  That is, run "make check-world" in the top directory of the source tree.
 

To enable tracing in every connection, you can probably insert PQtrace() call just before the return statement here in
connectDBComplete(). If you have enough disk space, it's better to accumulate the trace output by passing "a" to
fopen().

        switch (flag)
        {
            case PGRES_POLLING_OK:
                return 1;       /* success! */



Regards
Takayuki Tsunakawa


RE: libpq debug log

From
"iwata.aya@fujitsu.com"
Date:
Hi all,

Thank you Kirk san for creating the v14 patch.
I update the patch.  I fixed all of Tsunakawa san's review comments. 
I am trying to solve three bugs. Two bags were pointed out by Alvaro san
in a previous e-mail. And I found one bug. 

> From: Alvaro Herrera <alvherre@alvh.no-ip.org>
> Sent: Friday, January 22, 2021 6:53 AM
...
> > (5)
> > @@ -966,10 +966,6 @@ pqSaveParameterStatus(PGconn *conn, const char
> *name, const char *value)
> >      pgParameterStatus *pstatus;
> >      pgParameterStatus *prev;
> >
> > -    if (conn->Pfdebug)
> > -        fprintf(conn->Pfdebug, "pqSaveParameterStatus: '%s' =
> '%s'\n",
> > -                name, value);
> > -
> >
> > Where is this information output instead?
> 
> Hmm, seems to have been lost.  I restored it, but didn't verify
> the resulting behavior carefully.

I checked log output using Matsumura san's application app.c;
---
2021-01-27 11:29:49.718 <       ParameterStatus 22 "application_name" ""
pqSaveParameterStatus: 'application_name' = ''
2021-01-27 11:29:49.718 <       ParameterStatus 25 "client_encoding" "UTF8"
pqSaveParameterStatus: 'client_encoding' = 'UTF8'
---

New trace log prints "ParameterStatus" which report run-time parameter status.
And new log also prints parameter name and value in " StartupMessage " message.
We can know the parameter status using these protocol messages.
So I think "pqSaveParameterStatus:" is not necessary to output.

In StartupMessage, the protocol message is output as "UnknownMessage" like this. 
 [...] >       UnknownMessage  38 '\x00\x03\x00\x00user\x00iwata\x00database\x00postgres\x00\x00'

To fix this, I want to move a protocol message without the first byte1 to a frontend message logging function.
I will work on this. 

> From: 'Alvaro Herrera' <alvherre@alvh.no-ip.org>
> Sent: Friday, January 22, 2021 10:38 PM
> > Ah, that was the reason for separation.  Then, I'm fine with either
> > keeping the separation, or integrating the two functions in fe-misc.c
> > with PQuntrace() being also there.  I kind of think the latter is
> > better, because of code readability and, because tracing facility is
> > not a connection-related reature so categorizing it as "misc" feels
> > natural.
> 
> Maybe we can create a new file specifically for this to avoid mixing
> unrelated stuff in fe-misc.c -- say fe-trace.c where all this new
> tracing stuff goes.

I moved PQtrace() from fe-connect.c to fe-misc.c.
And I agree with creating new file for new tracing functions. I will do this.

> From: 'Alvaro Herrera' <alvherre@alvh.no-ip.org>
> Sent: Thursday, January 28, 2021 11:14 PM
...
> On 2021-Jan-28, k.jamison@fujitsu.com wrote:
> 
> > I realized that existing libpq regression tests in src/test/examples do not
> > test PQtrace(). At the same time, no other functions Is calling PQtrace(),
> > so it's expected that by default if there are no compilation errors, then it
> > will pass all the tests. And it did.
> >
> > So to check that the tracing feature is enabled and, as requested, outputs
> > a trace file, I temporarily modified a bit of testlibpq.c instead **based from
> > my current environment settings**, and ran the regression test.
> 
> Yeah.  It seems useful to build a real test harness based on the new
> tracing functionality.  And that is precisely why I added the option to
> suppress timestamps: so that we can write trace files that do not vary
> from run to run.  That allows us to have an "expected" trace to which we
> can compare the trace file produced by the actual run.  I had the idea
> that instead of a timestamp we would print a message counter.  I didn't
> implement that, however.

I think it is a useful suggestion. However I couldn't think of how to find out
if the logs were really in the correct order. And implementing this change looks very complicated.
So I would like to brush up this patch at first.

> (29)
> -void PQtrace(PGconn *conn, FILE *stream);
> +void PQtrace(PGconn *conn, FILE *stream, int flags);
> 
> As I said before, I'm afraid we cannot change the signature of existing API
> functions.  Please leave the signature of this function unchanged, and add a
> new function like PQtraceEx() that adds the flags argument.
> 
> I guess the purpose of adding the flag argument is to not output the timestamp
> by default, because getting timestamps would incur significant overhead
> (however, I'm not sure if that overhead is worth worrying about given the
> already incurred logging overhead.)  So, I think it's better to have a flag like
> PQTRACE_OUTPUT_TIMESTAMPS instead of
> PQTRACE_SUPPRESS_TIMESTAMPS, and the functions would look like:
> 
> void
> PQtrace(PGconn *conn, FILE *stream)
> {
>     PQtraceEx(conn, stream, 0);
> }
> 
> void
> PQtraceEx(PGconn *conn, FILE *stream, int flags)
> {
>     ... the new implementation in the patch
> }
> 
> Remember to add PQtraceEx in exports.txt and make sure it builds on
> Windows with Visual Studio.

I fixed just add new function. 
I will look into similar changes so far and give PQtraceEx() a better name. 


> But... can you evaluate the overhead for getting timestamps and see if we can
> turn it on by default, or further, if we need such a flag and PQtraceEx()?  For
> example, how about comparing the times to run the regression test with and
> without timestamps?

I will check it. And decide which is better by default.
In my understanding, one of the purpose of this flag is to make sure 
the log order is correct regardless of timestamp. (for example, it is useful auto-regression test)
I think PQtraceEx() and this flag is necessary as a developer option at least.

> (30)
> +/*
> + * Deallocate FE/BE message tracking memory.  We only do this because
> + * FE message can grow arbitrarily large, and skip it in the initial state,
> + * because it's likely pointless.
> + */
> +void
> +pqTraceUninit(PGconn *conn)
> +{
> +    if (conn->fe_msg &&
> +        conn->fe_msg->num_fields != DEF_FE_MSGFIELDS)
> +    {
> +        free(conn->fe_msg);
> +        conn->fe_msg = NULL;
> +    }
> 
> What does the second if condition mean?  If it's not necessary, change the
> comment accordingly.

I fix the comment and remove second if.


> (31)
> @@ -1011,11 +1615,16 @@ pqSendSome(PGconn *conn, int len)
>  int
>  pqFlush(PGconn *conn)
>  {
> -    if (conn->Pfdebug)
> -        fflush(conn->Pfdebug);
> -
>      if (conn->outCount > 0)
> +    {
> +        /* XXX comment */
> +        if (conn->Pfdebug)
> +        {
> +            pqLogFrontendMsg(conn, -1);
> +            fflush(conn->Pfdebug);
> +        }
>          return pqSendSome(conn, conn->outCount);
> +    }
> 
> Remove the comment line.

I remove it.

> (32)
> +#define INT_MAX    (2048/2)        /* maximum array size */
> 
> INT_MAX is available via:
> #include <limits.h>

I fixed it.

> (33)
>      /* realloc failed. Probably out of memory */
> -    appendPQExpBufferStr(&conn->errorMessage,
> -                         "cannot allocate memory
> for input buffer\n");
> +    appendPQExpBuffer(&conn->errorMessage,
> +                      "cannot allocate memory for input
> buffer\n");
> 
> This change appears to be irrelevant.

I remove this code.

> (34)
> +static void
> +pqStoreFeMsgStart(PGconn *conn, char type)
> +{
> +    if (PG_PROTOCOL_MAJOR(conn->pversion) >= 3)
> +        conn->fe_msg->msg_type = type;
> +}
> 
> Protocol version check is unnecessary because this tracing takes effect only in
> v3 protocol?

Yes, in PQtrace(), it is checked whether protocol 2.0 or 3.0 and tracing works only in v3.0. I remove it.


> (35)
> +        conn->fe_msg =
> +            realloc(conn->fe_msg,
> +                    sizeof(pqFrontendMessage) +
> +                    2 * conn->fe_msg->max_fields *
> sizeof(pqFrontendMessageField));
> 
> Align this size calculation with that in pqTraceInit().

I fixed it like this;
+               conn->fe_msg =
+                       realloc(conn->fe_msg,
+                                       offsetof(pqFrontendMessage, fields) +
+                                       2 * conn->fe_msg->max_fields * sizeof(pqFrontendMessageField));


> (36)
> +    /* append milliseconds */
> +    sprintf(timestr + strlen(timestr), ".%03d", (int) (tval.tv_usec / 1000));
> 
> Output microsecond instead.  As your example output shows, many lines
> show the exactly same timestamps and thus they are not distinguishable in
> time.

I changed %03d to %06d for microsecond.

> (37)
> +static char *
> +pqLogFormatTimestamp(char *timestr)
> +{
> ...
> +#define FORMATTED_TS_LEN 128
> +    strftime(timestr, FORMATTED_TS_LEN,
> 
> Add an argument to this function that indicates the size of timestr.  Define
> FORMATTED_TS_LEN globally in this source file, and have callers use it.
> That is, the code like:
> 
> +        char        timestr[128];
> +
> +        fprintf(conn->Pfdebug, "%s\t>\t%s\t%d",
> +                pqLogFormatTimestamp(timestr),
> 
> becomes:
> 
> +        char        timestr[FORMATTED_TS_LEN];
> +
> +        fprintf(conn->Pfdebug, "%s\t>\t%s\t%d",
> +                pqLogFormatTimestamp(timestr,
> sizeof(timestr)),

I fixed it.

> (38)
> +    if ((conn->traceFlags & PQTRACE_SUPPRESS_TIMESTAMPS) == 0)
> +    {
> +        char        timestr[128];
> +
> +        fprintf(conn->Pfdebug, "%s\t>\t%s\t%d",
> +                pqLogFormatTimestamp(timestr),
> +
>     pqGetProtocolMsgType(conn->fe_msg->msg_type,
> +
> MSGDIR_FROM_FRONTEND),
> +                msgLen);
> +    }
> +    else
> +        fprintf(conn->Pfdebug, ">\t%s\t%d",
> +
>     pqGetProtocolMsgType(conn->fe_msg->msg_type,
> +
> MSGDIR_FROM_FRONTEND),
> +                msgLen);
> 
> To reduce the code for easier modification, change the above code to
> something like:
> 
> +    char        timestr[128];
> +    char        *timestr_p = "";
> +
> +    if ((conn->traceFlags & PQTRACE_SUPPRESS_TIMESTAMPS) == 0)
> +        timestr_p = pqLogFormatTimestamp(timestr);
> +    fprintf(conn->Pfdebug, "%s\t>\t%s\t%d",
> +            timestr_p,
> +            pqGetProtocolMsgType(conn->fe_msg->msg_type,
> +
> MSGDIR_FROM_FRONTEND),
> +            msgLen);

I fixed it.

This patch should address the following:
1. fix 3 bugs
  1.1 -1 output in "Query" message
  1.2 two message output in "ReadyForQuery" message
  1.3 "StartupMessage" output as " UnknownMessage " 
2. creating new file for new tracing functions

Regards,
Aya Iwata
Fujitsu

Attachment

Re: libpq debug log

From
'Alvaro Herrera'
Date:
On 2021-Jan-29, tsunakawa.takay@fujitsu.com wrote:

> (30)
> +/*
> + * Deallocate FE/BE message tracking memory.  We only do this because
> + * FE message can grow arbitrarily large, and skip it in the initial state,
> + * because it's likely pointless.
> + */
> +void
> +pqTraceUninit(PGconn *conn)
> +{
> +    if (conn->fe_msg &&
> +        conn->fe_msg->num_fields != DEF_FE_MSGFIELDS)
> +    {
> +        free(conn->fe_msg);
> +        conn->fe_msg = NULL;
> +    }
> 
> What does the second if condition mean?  If it's not necessary, change the comment accordingly.

The rationale for that second condition is this: if the memory allocated
is the initial size, we don't free memory, because it would just be
allocated of the same size next time, and that size is not very big, so
it's not a big deal if we just let it be, so that it is reused if we
call PQtrace() again later.  However, if the allocated size is larger
than default, then it is possible that some previous tracing run has
enlarged the trace struct to a very large amount of memory, and we don't
want to leave that in place.

-- 
Álvaro Herrera       Valdivia, Chile



RE: libpq debug log

From
"tsunakawa.takay@fujitsu.com"
Date:
(39)
+      of tracing.  If (<literal>flags</literal> & <literal>PQTRACE_OUTPUT_TIMESTAMPS</literal>) is
+      true, then timestamp is not printed with each message.

The flag name (OUTPUT) and its description (not printed) doesn't match.

I think you can use less programmatical expression like "If <literal>flags</literal> contains
<literal>PQTRACE_OUTPUT_TIMESTAMPS</literal>".


(40)
diff --git a/src/interfaces/libpq/exports.txt b/src/interfaces/libpq/exports.txt
+PQtraceEx                 180
\ No newline at end of file

What's the second line?  Isn't the file missing an empty line at the end?

(41)
+void
+PQtraceEx(PGconn *conn, FILE *debug_port, int flags)
+{
+    if (conn == NULL)
+        return;
...
+    if (!debug_port)
+        return;

The if should better be as follows to match the style of existing surrounding code.

+    if (debug_port == NULL)


(42)
+pqLogFormatTimestamp(char *timestr, Size ts_len)

I think you should use int or size_t instead of Size here, because other libpq code uses them.  int should be enough.
Ifthe compiler gives warnings, prepend "(int)" before sizeof() at call sites.
 


(43)
+    /* append microseconds */
+    sprintf(timestr + strlen(timestr), ".%06d", (int) (tval.tv_usec / 1000));

"/ 1000" should be removed.


(44)
+    if ((conn->traceFlags & PQTRACE_OUTPUT_TIMESTAMPS) == 0)
+        timestr_p = pqLogFormatTimestamp(timestr, sizeof(timestr));

== 0 should be removed.


Regards
Takayuki Tsunakawa


RE: libpq debug log

From
"tsunakawa.takay@fujitsu.com"
Date:
From: 'Alvaro Herrera' <alvherre@alvh.no-ip.org>
> > +        conn->fe_msg->num_fields != DEF_FE_MSGFIELDS)

> The rationale for that second condition is this: if the memory allocated
> is the initial size, we don't free memory, because it would just be
> allocated of the same size next time, and that size is not very big, so
> it's not a big deal if we just let it be, so that it is reused if we
> call PQtrace() again later.  However, if the allocated size is larger
> than default, then it is possible that some previous tracing run has
> enlarged the trace struct to a very large amount of memory, and we don't
> want to leave that in place.

Ah, understood.  In that case, num_fields should be max_fields.

This has reminded me that freePGconn() should free be_msg and fe_msg.


Regards
Takayuki Tsunakawa



Re: libpq debug log

From
Kyotaro Horiguchi
Date:
At Wed, 3 Feb 2021 01:26:41 +0000, "tsunakawa.takay@fujitsu.com" <tsunakawa.takay@fujitsu.com> wrote in 
> (41)
> +void
> +PQtraceEx(PGconn *conn, FILE *debug_port, int flags)
> +{
> +    if (conn == NULL)
> +        return;
> ...
> +    if (!debug_port)
> +        return;
> 
> The if should better be as follows to match the style of existing surrounding code.
> 
> +    if (debug_port == NULL)

I don't have particular preference here, but FWIW the current
PQuntrace is doing this:

> void
> PQuntrace(PGconn *conn)
> {
>     if (conn == NULL)
>         return;
>     if (conn->Pfdebug)
>     {
>         fflush(conn->Pfdebug);

> (44)
> +    if ((conn->traceFlags & PQTRACE_OUTPUT_TIMESTAMPS) == 0)
> +        timestr_p = pqLogFormatTimestamp(timestr, sizeof(timestr));
> 
> == 0 should be removed.

Looking the doc mentioned in the comment #39:

+      <literal>flags</literal> contains flag bits describing the operating mode
+      of tracing.  If (<literal>flags</literal> & <literal>PQTRACE_OUTPUT_TIMESTAMPS</literal>) is
+      true, then timestamp is not printed with each message.

PQTRACE_OUTPUT_TIMESTAMPS is designed to *inhibit* timestamps from
being prepended. If that is actually intended, the symbol name should
be PQTRACE_NOOUTPUT_TIMESTAMP. Otherwise, the doc need to be fixed.

By the way removing "== 0" makes it difficult to tell whether the
condition is correct or not; I recommend to use '!= 0" rather than
removing '== 0'.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



Re: libpq debug log

From
Alvaro Herrera
Date:
On 2021-Feb-03, Kyotaro Horiguchi wrote:

> Looking the doc mentioned in the comment #39:
> 
> +      <literal>flags</literal> contains flag bits describing the operating mode
> +      of tracing.  If (<literal>flags</literal> & <literal>PQTRACE_OUTPUT_TIMESTAMPS</literal>) is
> +      true, then timestamp is not printed with each message.
> 
> PQTRACE_OUTPUT_TIMESTAMPS is designed to *inhibit* timestamps from
> being prepended. If that is actually intended, the symbol name should
> be PQTRACE_NOOUTPUT_TIMESTAMP. Otherwise, the doc need to be fixed.

I'm pretty sure I named the flag PQTRACE_SUPPRESS_TIMESTAMP (and I
prefer SUPPRESS to NOOUTPUT), because the idea is that the timestamp is
printed by default.  I think that's the sensible decision: applications
prefer to have timestamps, even if there's a tiny bit of overhead.  We
don't want to force them to pass a flag for that.  We only want the
no-timestamp behavior in order to be able to use it for libpq internal
testing.

-- 
Álvaro Herrera                            39°49'30"S 73°17'W
"Someone said that it is at least an order of magnitude more work to do
production software than a prototype. I think he is wrong by at least
an order of magnitude."                              (Brian Kernighan)



Re: libpq debug log

From
'Alvaro Herrera'
Date:
On 2021-Feb-03, tsunakawa.takay@fujitsu.com wrote:

> (39)
> +      of tracing.  If (<literal>flags</literal> & <literal>PQTRACE_OUTPUT_TIMESTAMPS</literal>) is
> +      true, then timestamp is not printed with each message.
> 
> The flag name (OUTPUT) and its description (not printed) doesn't match.
> 
> I think you can use less programmatical expression like "If <literal>flags</literal> contains
<literal>PQTRACE_OUTPUT_TIMESTAMPS</literal>".

I copied the original style from elsewhere in the manual.

> (41)
> +void
> +PQtraceEx(PGconn *conn, FILE *debug_port, int flags)
> +{

I'm not really sure about making this a separate API call. We could just
make it PQtrace() and increment the libpq so version.  I don't think
it's a big deal, frankly.

-- 
Álvaro Herrera                            39°49'30"S 73°17'W
"El conflicto es el camino real hacia la unión"



Re: libpq debug log

From
'Alvaro Herrera'
Date:
On 2021-Feb-03, tsunakawa.takay@fujitsu.com wrote:

> From: 'Alvaro Herrera' <alvherre@alvh.no-ip.org>
> > > +        conn->fe_msg->num_fields != DEF_FE_MSGFIELDS)
> 
> > The rationale for that second condition is this: if the memory allocated
> > is the initial size, we don't free memory, because it would just be
> > allocated of the same size next time, and that size is not very big, so
> > it's not a big deal if we just let it be, so that it is reused if we
> > call PQtrace() again later.  However, if the allocated size is larger
> > than default, then it is possible that some previous tracing run has
> > enlarged the trace struct to a very large amount of memory, and we don't
> > want to leave that in place.
> 
> Ah, understood.  In that case, num_fields should be max_fields.

Oh, of course.


-- 
Álvaro Herrera                            39°49'30"S 73°17'W
"Just treat us the way you want to be treated + some extra allowance
 for ignorance."                                    (Michael Brusser)



RE: libpq debug log

From
"tsunakawa.takay@fujitsu.com"
Date:
From: Alvaro Herrera <alvherre@alvh.no-ip.org>
> I'm pretty sure I named the flag PQTRACE_SUPPRESS_TIMESTAMP (and I
> prefer SUPPRESS to NOOUTPUT), because the idea is that the timestamp is
> printed by default.  I think that's the sensible decision: applications
> prefer to have timestamps, even if there's a tiny bit of overhead.  We
> don't want to force them to pass a flag for that.  We only want the
> no-timestamp behavior in order to be able to use it for libpq internal
> testing.

Agreed.  It makes sense to print timestamps by default because this feature will be used to diagnose slowness outside
thedatabase server.  (I misunderstood the motivation to introduce the flag).
 




Regards
Takayuki Tsunakawa



RE: libpq debug log

From
"tsunakawa.takay@fujitsu.com"
Date:
From: 'Alvaro Herrera' <alvherre@alvh.no-ip.org>
> > (41)
> > +void
> > +PQtraceEx(PGconn *conn, FILE *debug_port, int flags)
> > +{
> 
> I'm not really sure about making this a separate API call. We could just
> make it PQtrace() and increment the libpq so version.  I don't think
> it's a big deal, frankly.

If we change the function signature, we have to bump the so major version and thus soname.  The libpq's current so
majorversion is 5, which hasn't been changed since 2006.  I'm hesitant to change it for this feature.  If you think we
canbump the version to 6, I think we can go.
 


https://tldp.org/HOWTO/Program-Library-HOWTO/shared-libraries.html
--------------------------------------------------
try to make sure that your libraries are either backwards-compatible or that you've incremented the version number in
thesoname every time you make an incompatible change.
 

When a new version of a library is binary-incompatible with the old one the soname needs to change. In C, there are
fourbasic reasons that a library would cease to be binary compatible:
 

1. The behavior of a function changes so that it no longer meets its original specification,

2. Exported data items change (exception: adding optional items to the ends of structures is okay, as long as those
structuresare only allocated within the library).
 

3. An exported function is removed.

4. The interface of an exported function changes.
--------------------------------------------------



Regards
Takayuki Tsunakawa



RE: libpq debug log

From
"iwata.aya@fujitsu.com"
Date:
HI all,

I update the patch.
I modified code according to review comments of Tsunakawa san and Horiguchi san.

And I fixed some bugs.

> This patch should address the following:
> 1. fix 3 bugs
>   1.1 -1 output in "Query" message

The cause of this bug is that it call in pqFlush() function before flushing.
The purpose of calling pqLogFrontendMsg() here is to log the data that was in the buffer but not logged before
flushing.
The second argument of pqLogFrontendMsg() is message length, but we can't find it in pqFlush(). 
Therefor,  -1 was set here as a second argument and it was logged.
I thought about keeping the length as fields, but from the output Kirk san tried, this doesn't seem to happen.
Also, the message from the frontend to the backend calls pqPutMsgEnd () in advance, so the message should already be
logged.
So I deleted this call.

>   1.2 two message output in "ReadyForQuery" message

In the pqParseInput3 (), when the state becomes PGASYNC_IDLE, it returns to the caller. 
After that,  pqParseInput3 () is called again and protocol message and length are output again by calling pqGetc(),
pqGetInt().
 
To limit this, set the skipLogging flag when the status become PGASYNC_IDLE and fixed to skip the next pqGetc(),
pqGetInt().

>   1.3 "StartupMessage" output as " UnknownMessage "

I moved the code that handles the output of "SSLRequest", "StartupMessage", etc. 
to pqLogFrontendMsg() which is the function that outputs the front-end message.


> 2. creating new file for new tracing functions

I create new files libpq-logging.c and libpq-logging.h

> From: Tsunakawa, Takayuki/綱川 貴之 <tsunakawa.takay@fujitsu.com>
> Sent: Wednesday, February 3, 2021 10:27 AM

> (39)
> +      of tracing.  If (<literal>flags</literal> &
> <literal>PQTRACE_OUTPUT_TIMESTAMPS</literal>) is
> +      true, then timestamp is not printed with each message.
> 
> The flag name (OUTPUT) and its description (not printed) doesn't match.

I changed flag name to PQTRACE_SUPPRESS_TIMESTAMPS.

> 
> I think you can use less programmatical expression like "If
> <literal>flags</literal> contains
> <literal>PQTRACE_OUTPUT_TIMESTAMPS</literal>".
I fixed.

> (40)
> diff --git a/src/interfaces/libpq/exports.txt b/src/interfaces/libpq/exports.txt
> +PQtraceEx                 180
> \ No newline at end of file
> 
> What's the second line?  Isn't the file missing an empty line at the end?

I delete it.


> (41)
> +void
> +PQtraceEx(PGconn *conn, FILE *debug_port, int flags) {
> +    if (conn == NULL)
> +        return;
> ...
> +    if (!debug_port)
> +        return;
> 
> The if should better be as follows to match the style of existing surrounding
> code.
> 
> +    if (debug_port == NULL)

I fixed it.

> (42)
> +pqLogFormatTimestamp(char *timestr, Size ts_len)
> 
> I think you should use int or size_t instead of Size here, because other libpq
> code uses them.  int should be enough.  If the compiler gives warnings,
> prepend "(int)" before sizeof() at call sites.

I fixed it.

 
> (43)
> +    /* append microseconds */
> +    sprintf(timestr + strlen(timestr), ".%06d", (int) (tval.tv_usec /
> +1000));
> 
> "/ 1000" should be removed.

I removed it.
 
> (44)
> +    if ((conn->traceFlags & PQTRACE_OUTPUT_TIMESTAMPS) == 0)
> +        timestr_p = pqLogFormatTimestamp(timestr,
> sizeof(timestr));
> 
> == 0 should be removed.

I left it, referring Horiguchi san's comment.

Regards,
Aya Iwata

Attachment

RE: libpq debug log

From
"tsunakawa.takay@fujitsu.com"
Date:
From: Iwata, Aya/岩田 彩 <iwata.aya@fujitsu.com>
> I update the patch.
> I modified code according to review comments of Tsunakawa san and
> Horiguchi san.


I confirmed that all the previous feedback was reflected.  Here are some minor comments:


(45)
 void PQtrace(PGconn *conn, FILE *stream);
 </synopsis>
      </para>
 
+     <para>
+      Calls <function>PQtraceEx</function> to output with or without a timestamp
+      using <literal>flags</literal>.
+     </para>
+
+     <para>
+      <literal>flags</literal> contains flag bits describing the operating mode
+      of tracing.  If (<literal>flags</literal> contains <literal>PQTRACE_SUPPRESS_TIMESTAMPS</literal>),
+      then timestamp is not printed with each message.

The description of PQtrace() should be written independent of PQtraceEx().  It is an unnecessary implementation detail
tothe user that PQtrace() calls PQtraceEx() internally.  Plus, a separate entry for PQtraceEx() needs to be added.
 


(46)

If skipLogging is intended for use with backend -> frontend messages only, shouldn't it be placed in conn->b_msg?


(47)
+    /* Deallocate FE/BE message tracking memory. */
+    if (conn->fe_msg &&
+        /*
+         * If fields is allocated the initial size, we reuse it next time,
+         * because it would be allocated same size and the size is not big.
+         */
+            conn->fe_msg->max_fields != DEF_FE_MSGFIELDS)

I'm not completely sure if other places interpose a block comment like this between if/for/while conditions, but I
thinkit's better to put the comment before if.
 


Regards
Takayuki Tsunakawa



Re: libpq debug log

From
Kyotaro Horiguchi
Date:
At Tue, 9 Feb 2021 02:26:25 +0000, "tsunakawa.takay@fujitsu.com" <tsunakawa.takay@fujitsu.com> wrote in 
> From: Iwata, Aya/岩田 彩 <iwata.aya@fujitsu.com>
> > I update the patch.
> > I modified code according to review comments of Tsunakawa san and
> > Horiguchi san.
> 
> 
> I confirmed that all the previous feedback was reflected.  Here are some minor comments:
> 
> 
> (45)
>  void PQtrace(PGconn *conn, FILE *stream);
>  </synopsis>
>       </para>
>  
> +     <para>
> +      Calls <function>PQtraceEx</function> to output with or without a timestamp
> +      using <literal>flags</literal>.
> +     </para>
> +
> +     <para>
> +      <literal>flags</literal> contains flag bits describing the operating mode
> +      of tracing.  If (<literal>flags</literal> contains <literal>PQTRACE_SUPPRESS_TIMESTAMPS</literal>),
> +      then timestamp is not printed with each message.
> 
> The description of PQtrace() should be written independent of PQtraceEx().  It is an unnecessary implementation
detailto the user that PQtrace() calls PQtraceEx() internally.  Plus, a separate entry for PQtraceEx() needs to be
added.

This looks like a fusion of PQtrace and PQtraceEX. By the way, the
timestamp flag is needed at log emittion. So we can change the state
anytime.

PQtrace(conn, of);
PQtraceSetFlags(conn, PQTRACE_SUPPRESS_TIMESTAMPS);
<logging without timestamps>
PQtraceSetFlags(conn, 0);
<logging with timestamps>
..


> (46)
> 
> If skipLogging is intended for use with backend -> frontend messages only, shouldn't it be placed in conn->b_msg?

The name skipLogging is somewhat obscure. The flag doesn't inhibit all
logs from being emitted.  It seems like it represents how far bytes
the logging mechanism consumed for the limited cases. Thus, I think it
can be a cursor variable like inCursor.

If we have conn->be_msg->inLogged, for example, pqGetc and
pqLogMessageByte1() are written as the follows.

pqGetc(char *result, PGconn *conn)
{
    if (conn->inCursor >= conn->inEnd)
        return EOF;

    *result = conn->inBuffer[conn->inCursor++];

    if (conn->Pfdebug)
        pqLogMessageByte1(conn, *result);

    return 0;
}

pqLogMessageByte1(...)
{
   switch()
   {
     case LOG_FIRST_BYTE:
       /* No point in logging already logged bytes */
       if (conn->be_msg->inLogged >= conn->inCursor)
         return;
   ...
   }
   conn->be_msg->inLogged = conn->inCursor;
}

(pqCheckInBufferSpace needs to adjust inLogged.)

I'm not sure this is easier to read than the skipLogging.

> (47)
> +    /* Deallocate FE/BE message tracking memory. */
> +    if (conn->fe_msg &&
> +        /*
> +         * If fields is allocated the initial size, we reuse it next time,
> +         * because it would be allocated same size and the size is not big.
> +         */
> +            conn->fe_msg->max_fields != DEF_FE_MSGFIELDS)
> 
> I'm not completely sure if other places interpose a block comment like this between if/for/while conditions, but I
thinkit's better to put the comment before if.
 

(s/is/are/)

Agreed. At least it doesn't look good.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



RE: libpq debug log

From
"tsunakawa.takay@fujitsu.com"
Date:
From: Kyotaro Horiguchi <horikyota.ntt@gmail.com>
> > (45)
> This looks like a fusion of PQtrace and PQtraceEX. By the way, the
> timestamp flag is needed at log emittion. So we can change the state
> anytime.
>
> PQtrace(conn, of);
> PQtraceSetFlags(conn, PQTRACE_SUPPRESS_TIMESTAMPS);
> <logging without timestamps>
> PQtraceSetFlags(conn, 0);
> <logging with timestamps>

I find this better because the application does not have to call PQuntrace() and PQtrace() again to enable/disable
timestampoutput, which requires passing the FILE pointer again.  (I don't imagine applications would repeatedly turn
loggingon and off in practice, though.) 


> > (46)
> >
> > If skipLogging is intended for use with backend -> frontend messages only,
> shouldn't it be placed in conn->b_msg?
>
> The name skipLogging is somewhat obscure. The flag doesn't inhibit all
> logs from being emitted.  It seems like it represents how far bytes
> the logging mechanism consumed for the limited cases. Thus, I think it
> can be a cursor variable like inCursor.
>
> If we have conn->be_msg->inLogged, for example, pqGetc and
> pqLogMessageByte1() are written as the follows.
>
> pqGetc(char *result, PGconn *conn)
> {
>     if (conn->inCursor >= conn->inEnd)
>         return EOF;
>
>     *result = conn->inBuffer[conn->inCursor++];
>
>     if (conn->Pfdebug)
>         pqLogMessageByte1(conn, *result);
>
>     return 0;
> }
>
> pqLogMessageByte1(...)
> {
>    switch()
>    {
>      case LOG_FIRST_BYTE:
>        /* No point in logging already logged bytes */
>        if (conn->be_msg->inLogged >= conn->inCursor)
>          return;
>    ...
>    }
>    conn->be_msg->inLogged = conn->inCursor;
> }

This looks better design because stuff like skipLogging is an internal state of logging facility whose use should be
restrictedto fe-logging.c. 

> (pqCheckInBufferSpace needs to adjust inLogged.)
>
> I'm not sure this is easier to read than the skipLogging.

I'd like Iwata-san to evaluate this and decide whether to take this approach or the current one.


Regards
Takayuki Tsunakawa





Re: libpq debug log

From
Kyotaro Horiguchi
Date:
At Mon, 8 Feb 2021 14:57:47 +0000, "iwata.aya@fujitsu.com" <iwata.aya@fujitsu.com> wrote in 
> I update the patch.
> I modified code according to review comments of Tsunakawa san and Horiguchi san.
> 
> And I fixed some bugs.

Thanks for the new version.

+typedef enum
+{
+    MSGDIR_FROM_BACKEND,
+    MSGDIR_FROM_FRONTEND
+} PGCommSource;

This is halfly exposed to other part of libpq. Specifically only
MSGDIR_FROM_BACKEND is used in fe-misc.c and only for
pgLogMessageString and pqLogMessagenchar. I would suggest to hide this
enum from fe-misc.c.

Looking into pqLogMessageString,

+pqLogMessageString(PGconn *conn, const char *v, int length, PGCommSource source)
+{
+    if (source == MSGDIR_FROM_BACKEND && conn->be_msg->state != LOG_CONTENTS)
+    {
+        pqLogInvalidProtocol(conn, MSGDIR_FROM_BACKEND);
+        return;    /* XXX ??? */
+    }
+
+    fprintf(conn->Pfdebug, "\"%s\"", v);
+    if (source == MSGDIR_FROM_BACKEND)
+        pqTraceMaybeBreakLine(length, conn);
+}

The only part that shared by both be and fe is the fprintf().  I think
it can be naturally split into separate functions for backend and
frontend messages.

Looking into pqLogMessagenchar,

+/*
+ * pqLogMessagenchar: output a string of exactly len bytes message to the log
+ */
+void
+pqLogMessagenchar(PGconn *conn, const char *v, int len, PGCommSource commsource)
+{
+    fprintf(conn->Pfdebug, "\'");
+    pqLogBinaryMsg(conn, v, len, commsource);
+    fprintf(conn->Pfdebug, "\'");
+    pqTraceMaybeBreakLine(len, conn);
+}

+static void
+pqLogBinaryMsg(PGconn *conn, const char *v, int length, PGCommSource source)
+{
+    int            i,
+                pin;
+
+    if (source == MSGDIR_FROM_BACKEND && conn->be_msg->state != LOG_CONTENTS)
+    {
+        pqLogInvalidProtocol(conn, MSGDIR_FROM_BACKEND);
+        return;    /* XXX ??? */

# What is this???

+    }
    .. shared part
+}

pqLogMessagenchar is the sole caller of pqLogBinaryMsg. So we can
refactor the two functions and have pqLogMessagenchar_for_be and
_for_fe without a fear of the side effect to other callers.  (The
names are discussed below.)


+typedef enum PGLogState
+{

This is libpq-logging.c internal type. It is not needed to be exposed.



+extern void pqTraceForcelyBreakLine(int size, PGconn *conn);
+extern void pqStoreFrontendMsg(PGconn *conn, PGLogMsgDataType type, int length)+extern void pqStoreFeMsgStart(PGconn
*conn,char type);
 
+extern void pqLogFrontendMsg(PGconn *conn, int msgLen);
+extern void pqLogMessageByte1(PGconn *conn, char v);
+extern void pqLogMessageInt(PGconn *conn, int v, int length);
+extern void pqLogMessageString(PGconn *conn, const char *v, int length,
+                               PGCommSource commsource);
+extern void pqLogMessagenchar(PGconn *conn, const char *v, int length,
+                              PGCommSource commsource);

The API functions looks like randomly/inconsistently named and
designed.  I think that API should be in more structurally designed.

The comments about individual function names follow.

+/*
+ * pqTraceForcelyBreakLine:
+ *         If message is not completed, print a line break and reset.
+ */
+void
+pqTraceForcelyBreakLine(int size, PGconn *conn)
+{
+    fprintf(conn->Pfdebug, "\n");
+    pqTraceResetBeMsg(conn);
+}

Differently from the comment, this function doesn't work in a
conditional way nor in a forceful way. It is just putting a new line
and resetting the backend message variables.  I would name this as
pqTrace(Finish|Close)BeMsgLog().

+/*
+ * pqStoreFrontendMsg
+ *        Keep track of a from-frontend message that was just written to the
+ *        output buffer.
+ *
+ * Frontend messages are constructed piece by piece, and the message length
+ * is determined at the end, but sent to the server first; so for tracing
+ * purposes we store everything in memory and print to the trace file when
+ * the message is complete.
+ */
+void
+pqStoreFrontendMsg(PGconn *conn, PGLogMsgDataType type, int length)
+{

I would name this as pqTrace(Store/Append)FeMsg().


+void
+pqStoreFeMsgStart(PGconn *conn, char type)
+{
+    conn->fe_msg->msg_type = type;
+}

The name says that "stores the message "start"". But it actually
stores the message type. I would name this as
pqTraceSetFeMsgType(). Or in contrast to pqTraceFinishBeMsgLog, this
can be named as pqTrace(Start|Begin|Init)BeMsgLog().


+ * pqLogFrontendMsg
+ *        Print accumulated frontend message pieces to the trace file.
+ */
+void
+pqLogFrontendMsg(PGconn *conn, int msgLen)

I would name this as pqTraceEmitFeMsgLog().


+ * pqLogMessageByte1: output 1 char from-backend message to the log
+ * pqLogMessageInt: output a 2- or 4-byte integer from-backend msg to the log

I would name this as pqTraceEmitBeByteLog(), pqTraceEmitBeIntLog()
respectively.

+ * pqLogMessageString: output a null-terminated string to the log

This function is used for both directions. pqTraceEmitStringLog(). If it is split into fe and be parts, they would be
pqTraceEmit(Be|Fe)StringLog().


+ * pqLogMessagenchar: output a string of exactly len bytes message to the log

This logs a byte sequence in hexadecimals. I would name that as
pqTraceEmitBytesLog(). Or pqTraceEmit(Be|Fe)BytesLog().



...I finish once here.

Is there any thoughts? Optinions on the namings?

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



Re: libpq debug log

From
Kyotaro Horiguchi
Date:
At Tue, 9 Feb 2021 08:10:19 +0000, "tsunakawa.takay@fujitsu.com" <tsunakawa.takay@fujitsu.com> wrote in 
> From: Kyotaro Horiguchi <horikyota.ntt@gmail.com>
> > > (45)
> > This looks like a fusion of PQtrace and PQtraceEX. By the way, the
> > timestamp flag is needed at log emittion. So we can change the state
> > anytime.
> > 
> > PQtrace(conn, of);
> > PQtraceSetFlags(conn, PQTRACE_SUPPRESS_TIMESTAMPS);
> > <logging without timestamps>
> > PQtraceSetFlags(conn, 0);
> > <logging with timestamps>
> 
> I find this better because the application does not have to call PQuntrace() and PQtrace() again to enable/disable
timestampoutput, which requires passing the FILE pointer again.  (I don't imagine applications would repeatedly turn
loggingon and off in practice, though.)
 

Thanks and Yes, I also don't imagine that users change the timestmp
state repeatedly:p It's just a example.

> > > (46)
> > >
> > > If skipLogging is intended for use with backend -> frontend messages only,
> > shouldn't it be placed in conn->b_msg?
> > 
> > The name skipLogging is somewhat obscure. The flag doesn't inhibit all
> > logs from being emitted.  It seems like it represents how far bytes
> > the logging mechanism consumed for the limited cases. Thus, I think it
> > can be a cursor variable like inCursor.
> > 
> > If we have conn->be_msg->inLogged, for example, pqGetc and
> > pqLogMessageByte1() are written as the follows.
> > 
> > pqGetc(char *result, PGconn *conn)
> > {
> >     if (conn->inCursor >= conn->inEnd)
> >         return EOF;
> > 
> >     *result = conn->inBuffer[conn->inCursor++];
> > 
> >     if (conn->Pfdebug)
> >         pqLogMessageByte1(conn, *result);
> > 
> >     return 0;
> > }
> > 
> > pqLogMessageByte1(...)
> > {
> >    switch()
> >    {
> >      case LOG_FIRST_BYTE:
> >        /* No point in logging already logged bytes */
> >        if (conn->be_msg->inLogged >= conn->inCursor)
> >          return;
> >    ...
> >    }
> >    conn->be_msg->inLogged = conn->inCursor;
> > }
> 
> This looks better design because stuff like skipLogging is an internal state of logging facility whose use should be
restrictedto fe-logging.c.
 
> 
> > (pqCheckInBufferSpace needs to adjust inLogged.)
> > 
> > I'm not sure this is easier to read than the skipLogging.
> 
> I'd like Iwata-san to evaluate this and decide whether to take this approach or the current one.

+1

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



Re: libpq debug log

From
'Alvaro Herrera'
Date:
On 2021-Feb-04, tsunakawa.takay@fujitsu.com wrote:

> From: 'Alvaro Herrera' <alvherre@alvh.no-ip.org>
> > > (41)
> > > +void
> > > +PQtraceEx(PGconn *conn, FILE *debug_port, int flags)
> > > +{
> > 
> > I'm not really sure about making this a separate API call. We could just
> > make it PQtrace() and increment the libpq so version.  I don't think
> > it's a big deal, frankly.
> 
> If we change the function signature, we have to bump the so major version and thus soname.  The libpq's current so
majorversion is 5, which hasn't been changed since 2006.  I'm hesitant to change it for this feature.  If you think we
canbump the version to 6, I think we can go.
 

I think it's pretty clear we would not change the so-version for this
feature.


-- 
Álvaro Herrera                            39°49'30"S 73°17'W



Re: libpq debug log

From
Alvaro Herrera
Date:
On 2021-Feb-09, Kyotaro Horiguchi wrote:

> This looks like a fusion of PQtrace and PQtraceEX. By the way, the
> timestamp flag is needed at log emittion. So we can change the state
> anytime.
> 
> PQtrace(conn, of);
> PQtraceSetFlags(conn, PQTRACE_SUPPRESS_TIMESTAMPS);

I like this idea better than PQtraceEx().

-- 
Álvaro Herrera                            39°49'30"S 73°17'W



RE: libpq debug log

From
"iwata.aya@fujitsu.com"
Date:
Hi all,

Thank you for your review. I update patch to v17.

> From: Tsunakawa, Takayuki/綱川 貴之 <tsunakawa.takay@fujitsu.com>
> Sent: Tuesday, February 9, 2021 11:26 AM
> (45)
...
> The description of PQtrace() should be written independent of PQtraceEx().
> It is an unnecessary implementation detail to the user that PQtrace() calls
> PQtraceEx() internally.  Plus, a separate entry for PQtraceEx() needs to be
> added.

I add PQtraceSetFlags() description instead of PQtraceEx() in response to Horiguchi san's suggestion.

> (46)
>
> If skipLogging is intended for use with backend -> frontend messages only,
> shouldn't it be placed in conn->b_msg?

I moved skip flag to be_msg.

> (47)
...
> I'm not completely sure if other places interpose a block comment like this
> between if/for/while conditions, but I think it's better to put the comment
> before if.

I moved this comment to before if.

> From: Kyotaro Horiguchi <horikyota.ntt@gmail.com>
> Sent: Tuesday, February 9, 2021 5:26 PM

> +typedef enum
> +{
> +    MSGDIR_FROM_BACKEND,
> +    MSGDIR_FROM_FRONTEND
> +} PGCommSource;
>
> This is halfly exposed to other part of libpq. Specifically only
> MSGDIR_FROM_BACKEND is used in fe-misc.c and only for
> pgLogMessageString and pqLogMessagenchar. I would suggest to hide this
> enum from fe-misc.c.
>
> Looking into pqLogMessageString,
>
> +pqLogMessageString(PGconn *conn, const char *v, int length,
> PGCommSource source)
> +{
> +    if (source == MSGDIR_FROM_BACKEND &&
> conn->be_msg->state != LOG_CONTENTS)
> +    {
> +        pqLogInvalidProtocol(conn, MSGDIR_FROM_BACKEND);
> +        return;    /* XXX ??? */
> +    }
> +
> +    fprintf(conn->Pfdebug, "\"%s\"", v);
> +    if (source == MSGDIR_FROM_BACKEND)
> +        pqTraceMaybeBreakLine(length, conn);
> +}
>
> The only part that shared by both be and fe is the fprintf().  I think
> it can be naturally split into separate functions for backend and
> frontend messages.
>
> Looking into pqLogMessagenchar,
>
> +/*
> + * pqLogMessagenchar: output a string of exactly len bytes message to the
> log
> + */
> +void
> +pqLogMessagenchar(PGconn *conn, const char *v, int len, PGCommSource
> commsource)
> +{
> +    fprintf(conn->Pfdebug, "\'");
> +    pqLogBinaryMsg(conn, v, len, commsource);
> +    fprintf(conn->Pfdebug, "\'");
> +    pqTraceMaybeBreakLine(len, conn);
> +}
>
> +static void
> +pqLogBinaryMsg(PGconn *conn, const char *v, int length, PGCommSource
> source)
> +{
> +    int            i,
> +                pin;
> +
> +    if (source == MSGDIR_FROM_BACKEND &&
> conn->be_msg->state != LOG_CONTENTS)
> +    {
> +        pqLogInvalidProtocol(conn, MSGDIR_FROM_BACKEND);
> +        return;    /* XXX ??? */
>
> # What is this???
>
> +    }
>     .. shared part
> +}
>
> pqLogMessagenchar is the sole caller of pqLogBinaryMsg. So we can
> refactor the two functions and have pqLogMessagenchar_for_be and
> _for_fe without a fear of the side effect to other callers.  (The
> names are discussed below.)

Thank you for your advice on refactoring.
Separating pqLogMessagenchar() into pqLogMessagenchar_for_be and pqLogMessagenchar_for_fe
seemed like adding more similar code. So I didn't work on it.

> +typedef enum PGLogState
> +{
>
> This is libpq-logging.c internal type. It is not needed to be exposed.

I fixed it.

> +extern void pqTraceForcelyBreakLine(int size, PGconn *conn);
> +extern void pqStoreFrontendMsg(PGconn *conn, PGLogMsgDataType type,
> int length)+extern void pqStoreFeMsgStart(PGconn *conn, char type);
> +extern void pqLogFrontendMsg(PGconn *conn, int msgLen);
> +extern void pqLogMessageByte1(PGconn *conn, char v);
> +extern void pqLogMessageInt(PGconn *conn, int v, int length);
> +extern void pqLogMessageString(PGconn *conn, const char *v, int length,
> +                               PGCommSource
> commsource);
> +extern void pqLogMessagenchar(PGconn *conn, const char *v, int length,
> +                              PGCommSource
> commsource);
>
> The API functions looks like randomly/inconsistently named and
> designed.  I think that API should be in more structurally designed.
>
> The comments about individual function names follow.

Thank you for your advice. I changed these functions name.
>
> +/*
> + * pqTraceForcelyBreakLine:
> + *         If message is not completed, print a line break and reset.
> + */
> +void
> +pqTraceForcelyBreakLine(int size, PGconn *conn)
> +{
> +    fprintf(conn->Pfdebug, "\n");
> +    pqTraceResetBeMsg(conn);
> +}
>
> Differently from the comment, this function doesn't work in a
> conditional way nor in a forceful way. It is just putting a new line
> and resetting the backend message variables.  I would name this as
> pqTrace(Finish|Close)BeMsgLog().

This function can be used when a connection is lost or when the copyData result is ignored according to the original
code.
It is called to reset halfway logging. So I want to know that it will be forced to quit.
Therefore, I changed the name as pqTraceForcelyFinishBeMsgLog().

>
> +/*
> + * pqStoreFrontendMsg
> + *        Keep track of a from-frontend message that was just written
> to the
> + *        output buffer.
> + *
> + * Frontend messages are constructed piece by piece, and the message
> length
> + * is determined at the end, but sent to the server first; so for tracing
> + * purposes we store everything in memory and print to the trace file when
> + * the message is complete.
> + */
> +void
> +pqStoreFrontendMsg(PGconn *conn, PGLogMsgDataType type, int length)
> +{
>
> I would name this as pqTrace(Store/Append)FeMsg().

I renamed it to pqTraceStoreFeMsg(). Because people who do not know this feature is confused due to the name.
It is difficult to understand why store message during the logging.
>
>
> +void
> +pqStoreFeMsgStart(PGconn *conn, char type)
> +{
> +    conn->fe_msg->msg_type = type;
> +}
>
> The name says that "stores the message "start"". But it actually
> stores the message type. I would name this as
> pqTraceSetFeMsgType(). Or in contrast to pqTraceFinishBeMsgLog, this
> can be named as pqTrace(Start|Begin|Init)BeMsgLog().

I think pqTraceSetFeMsgType() is better because this function's behavior is just set first byte1 message from Frontend.
>
>
> + * pqLogFrontendMsg
> + *        Print accumulated frontend message pieces to the trace file.
> + */
> +void
> +pqLogFrontendMsg(PGconn *conn, int msgLen)
>
> I would name this as pqTraceEmitFeMsgLog().

I would name this as pqTraceLogFeMsg().
It is shorter and we can easy to understand what this function do.

> + * pqLogMessageByte1: output 1 char from-backend message to the log
> + * pqLogMessageInt: output a 2- or 4-byte integer from-backend msg to the
> log
>
> I would name this as pqTraceEmitBeByteLog(), pqTraceEmitBeIntLog()
> respectively.

I think log and emit would mean the same thing in this case. And log is more easy to understand in this case.
So I changed name to pqTraceLogBeMsgByte1() and pqTraceLogBeMsgInt().

>
> + * pqLogMessageString: output a null-terminated string to the log
>
> This function is used for both directions. pqTraceEmitStringLog(). If it is split
> into fe and be parts, they would be pqTraceEmit(Be|Fe)StringLog().

I changed name to pqTraceLogMsgString().


> + * pqLogMessagenchar: output a string of exactly len bytes message to the
> log
>
> This logs a byte sequence in hexadecimals. I would name that as
> pqTraceEmitBytesLog(). Or pqTraceEmit(Be|Fe)BytesLog().

It is named to refer to pqGetnchar/pqPutnchar. So I changed this name to pqTraceLogMsgnchar().

> ...I finish once here.

Thank you for your review.

> From: Kyotaro Horiguchi <horikyota.ntt@gmail.com>
> Sent: Tuesday, February 9, 2021 5:30 PM

> > > PQtrace(conn, of);
> > > PQtraceSetFlags(conn, PQTRACE_SUPPRESS_TIMESTAMPS); <logging
> without
> > > timestamps> PQtraceSetFlags(conn, 0); <logging with timestamps>
> >
> > I find this better because the application does not have to call
> > PQuntrace() and PQtrace() again to enable/disable timestamp output,
> > which requires passing the FILE pointer again.  (I don't imagine
> > applications would repeatedly turn logging on and off in practice,
> > though.)
>
> Thanks and Yes, I also don't imagine that users change the timestmp state
> repeatedly:p It's just a example.

I implemented PQtraceSetFlags() function. And add documentation of PQtraceSetFlags().

> > > The name skipLogging is somewhat obscure. The flag doesn't inhibit
> > > all logs from being emitted.  It seems like it represents how far
> > > bytes the logging mechanism consumed for the limited cases. Thus, I
> > > think it can be a cursor variable like inCursor.
...
> > > I'm not sure this is easier to read than the skipLogging.
> >
> > I'd like Iwata-san to evaluate this and decide whether to take this approach
> or the current one.
>
> +1

I thought this idea was very good and tried it easily. But this didn't work...
So I didn't work on this implementation because the code can be more complex.

Regards
Aya Iwata
Fujits


Attachment

RE: libpq debug log

From
"tsunakawa.takay@fujitsu.com"
Date:
(48)
 <synopsis>
 void PQtrace(PGconn *conn, FILE *stream);
 </synopsis>
      </para>

+     <para>
+      Calls <function>PQtraceSetFlags</function> to output with or without a timestamp.
+     </para>
+
      <note>

Why is this necessary?  Even if you decide to remove this change, can you share your idea on why you added this just in
case?


(49)
+      Determine to output tracing with or without a timestamp to a debugging file stream.

The description should be the overall functionality of this function considering the future additions of flags.
Somethinglike: 

Controls the tracing behavior of client/server communication.

+      then timestamp is not printed with each message. The default is output with timestamp.

The default value for flags argument (0) should be described here.

Also, it should be added that PQsetTraceFlags() can be called before or after the PQtrace() call.


(50)
I'd like you to consider skipLogging again, because it seems that such a variable is for the logging machinery to
controlstate transitions internally in fe-logging.c. 
What I'm concerned is that those who modify libpq have to be aware of skipLogging and would fail to handle it.


(51)
>> +typedef enum PGLogState
>> +{
>>
>> This is libpq-logging.c internal type. It is not needed to be exposed.

> I fixed it.

How did you fix it?  The typedef is still in .h file.  It should be in .h, shouldn't it?  Houw about the other
typedefs?



I won't comment on the other stuff on function naming.


Regards
Takayuki Tsunakawa





RE: libpq debug log

From
"iwata.aya@fujitsu.com"
Date:
Hi all,

I update patch to v18. It has been fixed in response to Tsunakawa san's review.

> From: Tsunakawa, Takayuki/綱川 貴之 <tsunakawa.takay@fujitsu.com>
> Sent: Friday, February 12, 2021 1:53 PM
>
> (48)
>  <synopsis>
>  void PQtrace(PGconn *conn, FILE *stream);
>  </synopsis>
>       </para>
>
> +     <para>
> +      Calls <function>PQtraceSetFlags</function> to output with or
> without a timestamp.
> +     </para>
> +
>       <note>
>
> Why is this necessary?  Even if you decide to remove this change, can you
> share your idea on why you added this just in case?

In the previous patch, we described the PQTRACE_SUPPRESS_TIMESTAMPS flag in the PQtrace () part.
And I think PQtrace () and PQtraceSetFlag () are closely related functions.
It would be nice for the user to know both features before using the logging feature.
So I added a description of PQtraceSetFlag () to the paragraph of PQtrace ().
However, the PQtraceSetFlag () documentation is in the next paragraph of PQtrace (),
so users can easily notice the PQtraceSetFlag () function. So I removed it.

> (49)
> +      Determine to output tracing with or without a timestamp to a
> debugging file stream.
>
> The description should be the overall functionality of this function considering
> the future additions of flags.  Something like:
>
> Controls the tracing behavior of client/server communication.
>
> +      then timestamp is not printed with each message. The default is
> output with timestamp.
>
> The default value for flags argument (0) should be described here.
>
> Also, it should be added that PQsetTraceFlags() can be called before or after
> the PQtrace() call.

I fixed documentation like this;
 If set to 0 (default),tracing will be output with timestamp.
 This function should be called after calling <function>PQtrace</function>.

> (50)
> I'd like you to consider skipLogging again, because it seems that such a
> variable is for the logging machinery to control state transitions internally in
> fe-logging.c.
> What I'm concerned is that those who modify libpq have to be aware of
> skipLogging and would fail to handle it.

To implement inLogging, logging skip method has been modified to be completed within the libpq-logging.c file.
The protocol message consists of First Byte, Length, and Contents.
When the first byte and length arrive, if contents is not yet in the buffer,
the process of reading the first byte and length is repeated.
Reloading is necessary to parse the protocol message and proceed,
but we want to skip logging because it is not necessary to output the message which already log to file.

To do this, remember the value of inCursor in the inLogging variable and record how far logging has progressed.
In the pqTraceLogBeMsgByte1() and pqTraceLogBeMsgInt(),
the value of inCursor at that time is stored in inLogging after outputting the log to a file.
If inCursor is smaller than inLogging, it exits the log output function without doing anything.
If all log messages are output, initialize the inLogging.

Changed the pqTraceMaybeBreakLine() function to return whether a line break has occurred
in order to consolidate the process of recording the cursor in one place. If it break line,
inLogging is initialized and inCursor is not remembered inLogging.

Also, as Horiguchi san mentioned, I changed to update the value of inLogging
even in the function where the value of inCursor is changed (Ex. pqCheckInBufferSpace () and pqReadData ()).


> (51)
> >> +typedef enum PGLogState
> >> +{
> >>
> >> This is libpq-logging.c internal type. It is not needed to be exposed.
>
> > I fixed it.
>
> How did you fix it?  The typedef is still in .h file.  It should be in .h, shouldn't
> it?  Houw about the other typedefs?

I didn’t move PGLogState to .c file because PGLogState is a member of be_msg which is referred from other files.


Regards,
Aya Iwata
Fujitsu

Attachment

Re: libpq debug log

From
Alvaro Herrera
Date:
Hello

In pqTraceOutputString(), you can use the return value from fprintf to
move the cursor -- no need to count chars.

I still think that the message-type specific functions should print the
message type, rather than having the string arrays.

-- 
Álvaro Herrera       Valdivia, Chile
"La gente vulgar sólo piensa en pasar el tiempo;
el que tiene talento, en aprovecharlo"



RE: libpq debug log

From
"tsunakawa.takay@fujitsu.com"
Date:
I'll look at the comments from Alvaro-san and Horiguchi-san.  Here are my review comments:


(23)
+    /* Trace message only when there is first 1 byte */
+    if (conn->Pfdebug && conn->outCount < conn->outMsgStart)
+    {
+        if (conn->outCount < conn->outMsgStart)
+            pqTraceOutputMessage(conn, conn->outBuffer + conn->outCount, true);
+        else
+            pqTraceOutputNoTypeByteMessage(conn,
+                                        conn->outBuffer + conn->outMsgStart);
+    }

The inner else path doesn't seem to be reached because both the outer and inner if contain the same condition.  I think
youwant to remove the second condition from the outer if. 


(24) pqTraceOutputNoTypeByteMessage
+        case 16:    /* CancelRequest */
+            fprintf(conn->Pfdebug, "%s\t>\tCancelRequest\t%d", timestr, length);
+            pqTraceOutputInt32(message, &LogCursor, conn->Pfdebug);
+            pqTraceOutputInt32(message, &LogCursor, conn->Pfdebug);
+            break;

Another int32 data needs to be output as follows:

--------------------------------------------------
Int32(80877102)
The cancel request code. The value is chosen to contain 1234 in the most significant 16 bits, and 5678 in the least
significant16 bits. (To avoid confusion, this code must not be the same as any protocol version number.) 

Int32
The process ID of the target backend.

Int32
The secret key for the target backend.
--------------------------------------------------


(25)
+        case 8 :    /* GSSENRequest or SSLRequest */

GSSENRequest -> GSSENCRequest


(26)
+static void
+pqTraceOutputByte1(const char *data, int *cursor, FILE *pfdebug)
+{
+    const char *v = data + *cursor;
+    /*

+static void
+pqTraceOutputf(const char *message, int end, FILE *f)
+{
+    int    cursor = 0;
+    pqTraceOutputString(message, &cursor, end, f);
+}

Put an empty line to separate the declaration part and execution part.


(27)
+    const char    *message_type = "UnkownMessage";
+
+    id = message[LogCursor++];
+
+    memcpy(&length, message + LogCursor , 4);
+    length = (int) pg_ntoh32(length);
+    LogCursor += 4;
+    LogEnd = length - 4;

+    /* Get a protocol type from first byte identifier */
+    if (toServer &&
+        id < lengthof(protocol_message_type_f) &&
+        protocol_message_type_f[(unsigned char)id] != NULL)
+        message_type = protocol_message_type_f[(unsigned char)id];
+    else if (!toServer &&
+        id < lengthof(protocol_message_type_b) &&
+        protocol_message_type_b[(unsigned char)id] != NULL)
+        message_type = protocol_message_type_b[(unsigned char)id];
+    else
+    {
+        fprintf(conn->Pfdebug, "Unknown message: %02x\n", id);
+        return;
+    }
+

The initial value "UnkownMessage" is not used.  So, you can initialize message_type with NULL and do like:

+     if (...)
+        ...
+    else if (...)
+        ...
+
+    if (message_type == NULL)
+    {
+        fprintf(conn->Pfdebug, "Unknown message: %02x\n", id);
+        return;
+

Plus, I think this should be done before looking at the message length.


(28)
pqTraceOutputBinary() is only used in pqTraceOutputNchar().  Do they have to be separated?


Regards
Takayuki Tsunakawa




RE: libpq debug log

From
"tsunakawa.takay@fujitsu.com"
Date:
From: Alvaro Herrera <alvherre@alvh.no-ip.org>
> In pqTraceOutputString(), you can use the return value from fprintf to
> move the cursor -- no need to count chars.

Yes, precisely, 2 bytes for the double quotes needs to be subtracted as follows:

    len = fprintf(...);
    *cursor += (len - 2);


> I still think that the message-type specific functions should print the
> message type, rather than having the string arrays.

I sort of think so to remove the big arrays.  But to minimize duplicate code, I think the code structure will look
like:

    fprintf(timestamp, length);
    switch (type)
    {
        case '?':
            pqTraceOutput?(...);
        break;
        case '?':
            /* No message content */
            fprintf("message_type_name");
        break;
    }    

void
pqTraceOutput?(...)
{
    fprintf("message_type_name");
    print message content;    
}

The order of length and message type is reversed.  The .sgml should also be changed accordingly.  What do you think?

Iwata-san, 
Why don't you submit v27 patch with the current arrays kept, and then v28 with the arrays removed soon after?


From: Kyotaro Horiguchi <horikyota.ntt@gmail.com>
> It would help when the value is "255", which is confusing between -1
> (or 255) in byte or 255 in 2-byte word. Or when the value looks like
> broken. I'd suggest "b"yte for 1 byte, "s"hort for 2 bytes, "l"ong for
> 4 bytes ('l' is confusing with '1', but anyway it is not needed)..

I don't have a strong opinion on this.  (I kind of think I want to see unprefixed numbers; readers will look at the
protocolreference anyway.)  I'd like to leave this up to Iwata-san and Alvaro-san.
 


    Regards
Takayuki Tsunakawa}


RE: libpq debug log

From
"iwata.aya@fujitsu.com"
Date:
Hi Alvaro san and Tsunakawa san,

Thank you for your review. I updated patch to v27.

`make check` output is following. I think it is OK.
```
2021-03-18 07:02:55.090598  <   ReadyForQuery   5 I
2021-03-18 07:02:55.090672  >   Terminate   4
2021-03-18 07:02:55.120492  >   Query   155 "CREATE TABLESPACE regress_tblspacewith LOCATION
'/home/iwata/pgsql/postgres/src/test/regress/testtablespace'WITH (some_nonexistent_parameter = true);" 
2021-03-18 07:02:55.121624  <   ErrorResponse   124 S "ERROR" V "ERROR" C "22023" M "unrecognized parameter
"some_nonexistent_parameter""F "reloptions.c" L "1447" R "parseRelOptionsInternal" \x00 "Z" 
2021-03-18 07:02:55.121664  <   ReadyForQuery   5 I
2021-03-18 07:02:55.121728  >   Query   144 "CREATE TABLESPACE regress_tblspacewith LOCATION
'/home/iwata/pgsql/postgres/src/test/regress/testtablespace'WITH (random_page_cost = 3.0);" 
2021-03-18 07:02:55.123335  <   CommandComplete 22 "CREATE TABLESPACE"
2021-03-18 07:02:55.123400  <   ReadyForQuery   5 I
2021-03-18 07:02:55.123460  >   Query   81 "SELECT spcoptions FROM pg_tablespace WHERE spcname =
'regress_tblspacewith';"
2021-03-18 07:02:55.126556  <   RowDescription  35 1 "spcoptions" 1213 5 1009 65535 -1 0
2021-03-18 07:02:55.126594  <   DataRow 32 1 22 '{random_page_cost=3.0}'
2021-03-18 07:02:55.126607  <   CommandComplete 13 "SELECT 1"
2021-03-18 07:02:55.126617  <   ReadyForQuery   5 I
```

> Iwata-san,
> Why don't you submit v27 patch with the current arrays kept, and then v28
> with the arrays removed soon after?

And I will try to remove byte1 type table for v28 patch.

> From: Tsunakawa, Takayuki/綱川 貴之 <tsunakawa.takay@fujitsu.com>
> Sent: Thursday, March 18, 2021 12:38 PM
> From: Alvaro Herrera <alvherre@alvh.no-ip.org>
> > In pqTraceOutputString(), you can use the return value from fprintf to
> > move the cursor -- no need to count chars.
>
> Yes, precisely, 2 bytes for the double quotes needs to be subtracted as
> follows:
>
>     len = fprintf(...);
>     *cursor += (len - 2);

Thank you for your advice. I changed pqTraceOutputString set cursor to fprintf  return -2.
And I removed cursor movement from that function.

> From: Tsunakawa, Takayuki/綱川 貴之 <tsunakawa.takay@fujitsu.com>
> Sent: Thursday, March 18, 2021 11:52 AM
...
> I'll look at the comments from Alvaro-san and Horiguchi-san.  Here are my
> review comments:

Thank you for your review. I am sorry previous patch contain some mistake.

> (23)
> +    /* Trace message only when there is first 1 byte */
> +    if (conn->Pfdebug && conn->outCount < conn->outMsgStart)
> +    {
> +        if (conn->outCount < conn->outMsgStart)
> +            pqTraceOutputMessage(conn, conn->outBuffer +
> conn->outCount, true);
> +        else
> +            pqTraceOutputNoTypeByteMessage(conn,
> +
>     conn->outBuffer + conn->outMsgStart);
> +    }
>
> The inner else path doesn't seem to be reached because both the outer and
> inner if contain the same condition.  I think you want to remove the second
> condition from the outer if.

Yes, I remove second condition.

> (24) pqTraceOutputNoTypeByteMessage
> +        case 16:    /* CancelRequest */
> +            fprintf(conn->Pfdebug,
> "%s\t>\tCancelRequest\t%d", timestr, length);
> +            pqTraceOutputInt32(message, &LogCursor,
> conn->Pfdebug);
> +            pqTraceOutputInt32(message, &LogCursor,
> conn->Pfdebug);
> +            break;
>
> Another int32 data needs to be output as follows:
>
> --------------------------------------------------
> Int32(80877102)
> The cancel request code. The value is chosen to contain 1234 in the most
> significant 16 bits, and 5678 in the least significant 16 bits. (To avoid
> confusion, this code must not be the same as any protocol version number.)
>
> Int32
> The process ID of the target backend.
>
> Int32
> The secret key for the target backend.
> --------------------------------------------------

Thank you. I read document again and I add one pqTraceOutputInt32().

> (25)
> +        case 8 :    /* GSSENRequest or SSLRequest */
>
> GSSENRequest -> GSSENCRequest

Thank you. I fixed.


> (26)
> +static void
> +pqTraceOutputByte1(const char *data, int *cursor, FILE *pfdebug) {
> +    const char *v = data + *cursor;
> +    /*
>
> +static void
> +pqTraceOutputf(const char *message, int end, FILE *f) {
> +    int    cursor = 0;
> +    pqTraceOutputString(message, &cursor, end, f); }
>
> Put an empty line to separate the declaration part and execution part.

Thank you. I fixed this. I add space anywhere in pqTraceOutput? function.

> (27)
> +    const char    *message_type = "UnkownMessage";
> +
> +    id = message[LogCursor++];
> +
> +    memcpy(&length, message + LogCursor , 4);
> +    length = (int) pg_ntoh32(length);
> +    LogCursor += 4;
> +    LogEnd = length - 4;
>
> +    /* Get a protocol type from first byte identifier */
> +    if (toServer &&
> +        id < lengthof(protocol_message_type_f) &&
> +        protocol_message_type_f[(unsigned char)id] != NULL)
> +        message_type = protocol_message_type_f[(unsigned
> char)id];
> +    else if (!toServer &&
> +        id < lengthof(protocol_message_type_b) &&
> +        protocol_message_type_b[(unsigned char)id] != NULL)
> +        message_type = protocol_message_type_b[(unsigned
> char)id];
> +    else
> +    {
> +        fprintf(conn->Pfdebug, "Unknown message: %02x\n", id);
> +        return;
> +    }
> +
>
> The initial value "UnkownMessage" is not used.  So, you can initialize
> message_type with NULL and do like:
>
> +     if (...)
> +        ...
> +    else if (...)
> +        ...
> +
> +    if (message_type == NULL)
> +    {
> +        fprintf(conn->Pfdebug, "Unknown message: %02x\n", id);
> +        return;
> +
>
> Plus, I think this should be done before looking at the message length.

I initialized message_type as NULL,  changed `else` to `if (message_type == NULL)`
and move message type setup code to after setting id.

> (28)
> pqTraceOutputBinary() is only used in pqTraceOutputNchar().  Do they have
> to be separated?

I see. I merged this code to pqTraceOutputBinary().

Regards,
Aya Iwata

Attachment

RE: libpq debug log

From
"tsunakawa.takay@fujitsu.com"
Date:
From: Iwata, Aya/岩田 彩 <iwata.aya@fujitsu.com>
> > Yes, precisely, 2 bytes for the double quotes needs to be subtracted
> > as
> > follows:
> >
> >     len = fprintf(...);
> >     *cursor += (len - 2);
>
> Thank you for your advice. I changed pqTraceOutputString set cursor to fprintf
> return -2.
> And I removed cursor movement from that function.

Ouch, not 2 but 3, to include a single whitespace at the beginning.

The rest looks good.  I hope we're almost at the finish line.


    Regards
Takayuki Tsunakawa}



Re: libpq debug log

From
Kyotaro Horiguchi
Date:
At Thu, 18 Mar 2021 07:34:36 +0000, "tsunakawa.takay@fujitsu.com" <tsunakawa.takay@fujitsu.com> wrote in 
> From: Iwata, Aya/岩田 彩 <iwata.aya@fujitsu.com>
> > > Yes, precisely, 2 bytes for the double quotes needs to be subtracted
> > > as
> > > follows:
> > >
> > >     len = fprintf(...);
> > >     *cursor += (len - 2);
> > 
> > Thank you for your advice. I changed pqTraceOutputString set cursor to fprintf
> > return -2.
> > And I removed cursor movement from that function.
> 
> Ouch, not 2 but 3, to include a single whitespace at the beginning.
> 
> The rest looks good.  I hope we're almost at the finish line.

Maybe.

At Wed, 17 Mar 2021 13:36:32 -0300, Alvaro Herrera <alvherre@alvh.no-ip.org> wrote in 
> In pqTraceOutputString(), you can use the return value from fprintf to
> move the cursor -- no need to count chars.
> 
> I still think that the message-type specific functions should print the
> message type, rather than having the string arrays.

In other words, pqTraceOutputMessage recognizes message id and calls
the function corresponding one-on-one to the id. So the functions
knows what is the message type of myself and there's no reason for
pqTraceOutputMessage to print the message type on its behalf.


+ pqTraceOutputR(const char *message, FILE *f)
+ {
+     int    cursor = 0;
+ 
+     pqTraceOutputInt32(message, &cursor, f);

I don't understand the reason for spliting message and &cursor here.

+ pqTraceOutputR(const char *message, FILE *f)
+ {
+     char *p = message;
+ 
+     pqTraceOutputInt32(&p, f);

works well.


+/* RowDescription */
+static void
+pqTraceOutputT(const char *message, int end, FILE *f)
+{
+    int    cursor = 0;
+    int nfields;
+    int    i;
+
+    nfields = pqTraceOutputInt16(message, &cursor, f);
+
+    for (i = 0; i < nfields; i++)
+    {
+        pqTraceOutputString(message, &cursor, end, f);
+        pqTraceOutputInt32(message, &cursor, f);
+        pqTraceOutputInt16(message, &cursor, f);
+        pqTraceOutputInt32(message, &cursor, f);
+        pqTraceOutputInt16(message, &cursor, f);
+        pqTraceOutputInt32(message, &cursor, f);
+        pqTraceOutputInt16(message, &cursor, f);
+    }
+}

I didn't looked closer, but lookong the usage of the variable "end",
something's wrong in the function.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



RE: libpq debug log

From
"tsunakawa.takay@fujitsu.com"
Date:
From: Kyotaro Horiguchi <horikyota.ntt@gmail.com>
> + pqTraceOutputR(const char *message, FILE *f)
> + {
> +     int    cursor = 0;
> +
> +     pqTraceOutputInt32(message, &cursor, f);
>
> I don't understand the reason for spliting message and &cursor here.
>
> + pqTraceOutputR(const char *message, FILE *f)
> + {
> +     char *p = message;
> +
> +     pqTraceOutputInt32(&p, f);
>
> works well.

Yes, that would also work.  But I like the separate cursor + fixed starting point here, probably because it's sometimes
confusingto see the pointer value changed inside functions.  (And a pointer itself is an allergy for some people, not
tomention a pointer to ointer...)  Also, libpq uses cursors for network I/O buffers.  So, I think the patch can be as
itis now. 


> +static void
> +pqTraceOutputT(const char *message, int end, FILE *f)
> +{
> +    int    cursor = 0;
> +    int nfields;
> +    int    i;
> +
> +    nfields = pqTraceOutputInt16(message, &cursor, f);
> +
> +    for (i = 0; i < nfields; i++)
> +    {
> +        pqTraceOutputString(message, &cursor, end, f);
> +        pqTraceOutputInt32(message, &cursor, f);
> +        pqTraceOutputInt16(message, &cursor, f);
> +        pqTraceOutputInt32(message, &cursor, f);
> +        pqTraceOutputInt16(message, &cursor, f);
> +        pqTraceOutputInt32(message, &cursor, f);
> +        pqTraceOutputInt16(message, &cursor, f);
> +    }
> +}
>
> I didn't looked closer, but lookong the usage of the variable "end",
> something's wrong in the function.

Ah, end doesn't serve any purpose.  I guess the compiler emitted a warning "end is not used".  I think end can be
removed.



    Regards
Takayuki Tsunakawa}




RE: libpq debug log

From
"iwata.aya@fujitsu.com"
Date:
Hi Horiguchi san and Tsunakawa san,

Thank you for you review.

I update patch to v28.  In this patch, I removed array.
And I fixed some code according to Horiguchi san and Tsunakawa san review comment.

> From: Tsunakawa, Takayuki/綱川 貴之 <tsunakawa.takay@fujitsu.com>
> Sent: Thursday, March 18, 2021 12:38 PM
> I sort of think so to remove the big arrays.  But to minimize duplicate code, I
> think the code structure will look like:
>
>     fprintf(timestamp, length);
>     switch (type)
>     {
>         case '?':
>             pqTraceOutput?(...);
>         break;
>         case '?':
>             /* No message content */
>             fprintf("message_type_name");
>         break;
>     }
>
> void
> pqTraceOutput?(...)
> {
>     fprintf("message_type_name");
>     print message content;
> }

The code follows above format.
And I changed .sgml documentation;
- Changed order of message length and type
- Removed byte-16 and byte-32 explanation because I removed # output in previous patch.

Output style is following;

```
2021-03-18 08:59:36.141660  <   5   ReadyForQuery I
2021-03-18 08:59:36.141723  >   4   Terminate
2021-03-18 08:59:36.173263  >   155 Query "CREATE TABLESPACE regress_tblspacewith LOCATION
'/home/iwata/pgsql/postgres/src/test/regress/testtablespace'WITH (some_nonexistent_parameter = true);" 
2021-03-18 08:59:36.174439  <   124 ErrorResponse S "ERROR" V "ERROR" C "22023" M "unrecognized parameter
"some_nonexistent_parameter""F "reloptions.c" L "1456" R "parseRelOptionsInternal" \x00 "Z" 
2021-03-18 08:59:36.174483  <   5   ReadyForQuery I
2021-03-18 08:59:36.174545  >   144 Query "CREATE TABLESPACE regress_tblspacewith LOCATION
'/home/iwata/pgsql/postgres/src/test/regress/testtablespace'WITH (random_page_cost = 3.0);" 
2021-03-18 08:59:36.176155  <   22  CommandComplete "CREATE TABLESPACE"
2021-03-18 08:59:36.176190  <   5   ReadyForQuery I
2021-03-18 08:59:36.176243  >   81  Query "SELECT spcoptions FROM pg_tablespace WHERE spcname =
'regress_tblspacewith';"                                                      
2021-03-18 08:59:36.179286  <   35  RowDescription 1 "spcoptions" 1213 5 1009 65535 -1 0
2021-03-18 08:59:36.179326  <   32  DataRow 1 22 '{random_page_cost=3.0}'
2021-03-18 08:59:36.179339  <   13  CommandComplete "SELECT 1"
2021-03-18 08:59:36.179349  <   5   ReadyForQuery I
2021-03-18 08:59:36.179504  >   42  Query "DROP TABLESPACE regress_tblspacewith;"
2021-03-18 08:59:36.180400  <   20  CommandComplete "DROP TABLESPACE"
2021-03-18 08:59:36.180432  <   5   ReadyForQuery I
```


> -----Original Message-----
> From: Kyotaro Horiguchi <horikyota.ntt@gmail.com>
> Sent: Thursday, March 18, 2021 5:30 PM
>
> At Thu, 18 Mar 2021 07:34:36 +0000, "tsunakawa.takay@fujitsu.com"
> <tsunakawa.takay@fujitsu.com> wrote in
> > From: Iwata, Aya/岩田 彩 <iwata.aya@fujitsu.com>
> > > > Yes, precisely, 2 bytes for the double quotes needs to be
> > > > subtracted as
> > > > follows:
> > > >
> > > >     len = fprintf(...);
> > > >     *cursor += (len - 2);
> > >
> > > Thank you for your advice. I changed pqTraceOutputString set cursor
> > > to fprintf return -2.
> > > And I removed cursor movement from that function.
> >
> > Ouch, not 2 but 3, to include a single whitespace at the beginning.
> >
> > The rest looks good.  I hope we're almost at the finish line.
>
> Maybe.

String is end by '\0'. So (len -2) is OK.
However it seems like mistake because fprintf output string and 3 characters.
So I added explanation here and changed (len -2) to (len -3 +1).
I think it is OK because I found similar style in ecpg code.

> > + pqTraceOutputR(const char *message, FILE *f) {
> > +     int    cursor = 0;
> > +
> > +     pqTraceOutputInt32(message, &cursor, f);
> >
> > I don't understand the reason for spliting message and &cursor here.
> >
> > + pqTraceOutputR(const char *message, FILE *f) {
> > +     char *p = message;
> > +
> > +     pqTraceOutputInt32(&p, f);
> >
> > works well.
>
> Yes, that would also work.  But I like the separate cursor + fixed starting
> point here, probably because it's sometimes confusing to see the pointer
> value changed inside functions.  (And a pointer itself is an allergy for some
> people, not to mention a pointer to ointer...)  Also, libpq uses cursors for
> network I/O buffers.  So, I think the patch can be as it is now.

I think pass message and cursor is better. Because it is easy to understand and
The moving cursor style is used when libpq execute protocol message.
So I didn't make this change.

> +/* RowDescription */
> +static void
> +pqTraceOutputT(const char *message, int end, FILE *f) {
> +    int    cursor = 0;
> +    int nfields;
> +    int    i;
> +
> +    nfields = pqTraceOutputInt16(message, &cursor, f);
> +
> +    for (i = 0; i < nfields; i++)
> +    {
> +        pqTraceOutputString(message, &cursor, end, f);
> +        pqTraceOutputInt32(message, &cursor, f);
> +        pqTraceOutputInt16(message, &cursor, f);
> +        pqTraceOutputInt32(message, &cursor, f);
> +        pqTraceOutputInt16(message, &cursor, f);
> +        pqTraceOutputInt32(message, &cursor, f);
> +        pqTraceOutputInt16(message, &cursor, f);
> +    }
> +}
>
> I didn't looked closer, but lookong the usage of the variable "end", something's
> wrong in the function.

I removed end from the function.
pqTraceOutputString no longer use message end cursor.



Regards,
Aya Iwata

Attachment

RE: libpq debug log

From
"tsunakawa.takay@fujitsu.com"
Date:
Hello Iwata-san,


Thanks to removing the static arrays, the code got much smaller.  I'm happy with this result.


(1)
+      (<literal>></literal> for messages from client to server,
+      and <literal><</literal> for messages from server to client),

I think this was meant to say "> or <".  So, maybe you should remove "," at the end of the first line, and replace
"and"with "or". 


(2)
+        case 8 :    /* GSSENCRequest or SSLRequest */
+            /* These messsage does not reach here. */

messsage does -> messages do


(3)
+    fprintf(f, "\tAuthentication");

Why don't you move this \t in each message type to the end of:

+    fprintf(conn->Pfdebug, "%s\t%s\t%d", timestr, prefix, length);

Plus, don't we say in the manual that fields (timestamp, direction, length, message type, and message content) are
separatedby a tab? 
Also, the code doesn't seem to separate the message type and content with a tab.


(4)
Where did the processing for unknown message go in pqTraceOutputMessage()?  Add default label?


(5)
+        case 16:    /* CancelRequest */
+            fprintf(conn->Pfdebug, "%s\t>\t%d\tCancelRequest", timestr, length);

I think this should follow pqTraceOutputMessage().  That is, each case label only print the message type name in case
othermessage types are added in the future. 



Regards
Takayuki Tsunakawa





RE: libpq debug log

From
"iwata.aya@fujitsu.com"
Date:
Hi Tsunakawa san,

Thank you for your review. I update patch to v29.

Output is following. It is fine.
```
2021-03-19 07:21:09.917302  >   4   Terminate
2021-03-19 07:21:09.961494  >   155 Query    "CREATE TABLESPACE regress_tblspacewith LOCATION
'/home/iwata/pgsql/postgres/src/test/regress/testtablespace'WITH (some_nonexistent_parameter = true);" 
2021-03-19 07:21:09.962657  <   124 ErrorResponse    S "ERROR" V "ERROR" C "22023" M "unrecognized parameter
"some_nonexistent_parameter""F "reloptions.c" L "1456" R "parseRelOptionsInternal" \x00 "Z" 
2021-03-19 07:21:09.962702  <   5   ReadyForQuery    I
2021-03-19 07:21:09.962767  >   144 Query    "CREATE TABLESPACE regress_tblspacewith LOCATION
'/home/iwata/pgsql/postgres/src/test/regress/testtablespace'WITH (random_page_cost = 3.0);" 
2021-03-19 07:21:09.967056  <   22  CommandComplete  "CREATE TABLESPACE"
2021-03-19 07:21:09.967092  <   5   ReadyForQuery    I
2021-03-19 07:21:09.967148  >   81  Query    "SELECT spcoptions FROM pg_tablespace WHERE spcname =
'regress_tblspacewith';"
2021-03-19 07:21:09.970338  <   35  RowDescription   1 "spcoptions" 1213 5 1009 65535 -1 0
2021-03-19 07:21:09.970402  <   32  DataRow  1 22 '{random_page_cost=3.0}'
2021-03-19 07:21:09.970420  <   13  CommandComplete  "SELECT 1"
2021-03-19 07:21:09.970431  <   5   ReadyForQuery    I
2021-03-19 07:21:09.970558  >   42  Query    "DROP TABLESPACE regress_tblspacewith;"
2021-03-19 07:21:09.971500  <   20  CommandComplete  "DROP TABLESPACE"
2021-03-19 07:21:09.971561  <   5   ReadyForQuery    I
```

> -----Original Message-----
> From: Tsunakawa, Takayuki/綱川 貴之 <tsunakawa.takay@fujitsu.com>
> Sent: Friday, March 19, 2021 11:37 AM


> Thanks to removing the static arrays, the code got much smaller.  I'm happy
> with this result.

Thank you so much. Your advice was very helpful in refactoring the patch.

> (1)
> +      (<literal>></literal> for messages from client to server,
> +      and <literal><</literal> for messages from server to client),
>
> I think this was meant to say "> or <".  So, maybe you should remove "," at
> the end of the first line, and replace "and" with "or".

I fixed.

>
>
> (2)
> +        case 8 :    /* GSSENCRequest or SSLRequest */
> +            /* These messsage does not reach here. */
>
> messsage does -> messages do

I fixed.

> (3)
> +    fprintf(f, "\tAuthentication");
>
> Why don't you move this \t in each message type to the end of:
>
> +    fprintf(conn->Pfdebug, "%s\t%s\t%d", timestr, prefix, length);

I fixed.

>
> Plus, don't we say in the manual that fields (timestamp, direction, length,
> message type, and message content) are separated by a tab?

Sure. I added following documentation;
+ Non-message contents fields (timestamp, direction, length and message type)
+ are separated by a tab. Message contents are separated by a space.

> Also, the code doesn't seem to separate the message type and content with a
> tab.

I fixed to output a tab at the end of message types.

> (4)
> Where did the processing for unknown message go in
> pqTraceOutputMessage()?  Add default label?


> (5)
> +        case 16:    /* CancelRequest */
> +            fprintf(conn->Pfdebug,
> "%s\t>\t%d\tCancelRequest", timestr, length);
>
> I think this should follow pqTraceOutputMessage().  That is, each case label
> only print the message type name in case other message types are added in
> the future.

Sure. I fixed pqTraceOutputNoTypeByteMessage() following to pqTraceOutputMessage() style.


Regards,
Aya Iwata

Attachment

RE: libpq debug log

From
"tsunakawa.takay@fujitsu.com"
Date:
Alvaro-san, Iwata-san,
cc: Tom-san, Horiguchi-san,


Thank you, it finally looks complete to me!

Alvaro-san,
We appreciate your help and patience, sometimes rewriting large part of the patch.  Could you do the final check (and
possiblytweak) for commit? 



    Regards
Takayuki Tsunakawa




Re: libpq debug log

From
"alvherre@alvh.no-ip.org"
Date:
Hello

I added an option to the new libpq_pipeline program that it activates
libpq trace.  It works nicely and I think we can add that to the
regression tests.  However I have two observations.

1. The trace output for the error message won't be very nice, because it
prints line numbers.  So if I want to match the output to an "expected"
file, it would break every time somebody edits the source file where the
error originates and the ereport() line is moved.  For example:

    <    70    ErrorResponse     S "ERROR" V "ERROR" C "22012" M "division by zero" F "numeric.c" L "8366" R "div_var"
\x00"Z"
 

The line number 8366 in this line would be problematic.  I think if we
want regression testing for this, we should have another trace flag to
suppress output of a few of these fields.  I would have it print only S,
C and M.

(Hey, what the heck is that "Z" at the end of the message?  I thought
the error ended right at the \x00 ...)


2. The < and > characters are not good for visual inspection.  I
replaced them with F and B and I think it's much clearer.  Compare:

    >    27    Query     "SET lc_messages TO "C""
    <    8    CommandComplete     "SET"
    <    5    ReadyForQuery     I
    >    21    Parse     "" "SELECT $1" 1 23
    >    19    Bind     "" "" 0 1 1 '1' 1 0
    >    6    Describe     P ""
    >    9    Execute     "" 0
    >    4    Sync
    <    4    ParseComplete
    <    4    BindComplete
    <    33    RowDescription     1 "?column?" 0 0 23 4 -1 0
    <    11    DataRow     1 1 '1'
    <    13    CommandComplete     "SELECT 1"
    <    5    ReadyForQuery     I
    >    4    Terminate

with

    F    27    Query     "SET lc_messages TO "C""
    B    8    CommandComplete     "SET"
    B    5    ReadyForQuery     I
    F    21    Parse     "" "SELECT $1" 1 23
    F    19    Bind     "" "" 0 1 1 '1' 1 0
    F    6    Describe     P ""
    F    9    Execute     "" 0
    F    4    Sync
    B    4    ParseComplete
    B    4    BindComplete
    B    33    RowDescription     1 "?column?" 0 0 23 4 -1 0
    B    11    DataRow     1 1 '1'
    B    13    CommandComplete     "SELECT 1"
    B    5    ReadyForQuery     I
    F    4    Terminate

I think the second one is much easier on the eye.

(This one is the output from "libpq_pipeline simple_pipeline").

-- 
Álvaro Herrera       Valdivia, Chile
"Saca el libro que tu religión considere como el indicado para encontrar la
oración que traiga paz a tu alma. Luego rebootea el computador
y ve si funciona" (Carlos Duclós)



Re: libpq debug log

From
"alvherre@alvh.no-ip.org"
Date:
Proposed changes on top of v29.

-- 
Álvaro Herrera       Valdivia, Chile

Attachment

Re: libpq debug log

From
"alvherre@alvh.no-ip.org"
Date:
On 2021-Mar-26, alvherre@alvh.no-ip.org wrote:

> Proposed changes on top of v29.

This last one uses libpq_pipeline -t and verifies the output against an
expected trace file.  Applies on top of all the previous patches.  I
attach the whole lot, so that the CF bot has a chance to run it.

I did notice another problem for comparison of expected trace files,
which is that RowDescription includes table OIDs for some columns.  I
think we would need to have a flag to suppress that too, somehow,
although the answer to what should we do is not as clear as for the
other two cases.

I dodged the issue by just using -t for the pipeline cases that don't
have OIDs in their output.  This leaves us with no coverage for the
ErrorResponse message, which I think is a shortcoming.  If we fixed the
OID problem and the ErrorResponse problem, we could add an expected
trace for pipeline_abort.  I think we should do that.

Maybe the easiest way is to have a new flag PQTRACE_REGRESS_MODE.

-- 
Álvaro Herrera                            39°49'30"S 73°17'W
Essentially, you're proposing Kevlar shoes as a solution for the problem
that you want to walk around carrying a loaded gun aimed at your foot.
(Tom Lane)

Attachment

Re: libpq debug log

From
"alvherre@alvh.no-ip.org"
Date:
On 2021-Mar-27, alvherre@alvh.no-ip.org wrote:

> This last one uses libpq_pipeline -t and verifies the output against an
> expected trace file.  Applies on top of all the previous patches.  I
> attach the whole lot, so that the CF bot has a chance to run it.

All tests pass, but CFbot does not run src/test/modules, so it's not
saying much.

-- 
Álvaro Herrera                            39°49'30"S 73°17'W



RE: libpq debug log

From
"tsunakawa.takay@fujitsu.com"
Date:
From: alvherre@alvh.no-ip.org <alvherre@alvh.no-ip.org>
> I added an option to the new libpq_pipeline program that it activates
> libpq trace.  It works nicely and I think we can add that to the
> regression tests.

That's good news.  Thank you.



> 1. The trace output for the error message won't be very nice, because it
> prints line numbers.  So if I want to match the output to an "expected"
> file, it would break every time somebody edits the source file where the
> error originates and the ereport() line is moved.  For example:

> (Hey, what the heck is that "Z" at the end of the message?  I thought
> the error ended right at the \x00 ...)

We'll investigate these issues.


> 2. The < and > characters are not good for visual inspection.  I
> replaced them with F and B and I think it's much clearer.  Compare:
> I think the second one is much easier on the eye.

Yes, agreed.  I too thought of something like "C->S" and "S->C" because client and server should be more familiar for
usersthan frontend and backend.
 


Regards
Takayuki Tsunakawa



RE: libpq debug log

From
"tsunakawa.takay@fujitsu.com"
Date:
From: alvherre@alvh.no-ip.org <alvherre@alvh.no-ip.org>
> > Proposed changes on top of v29.
> 
> This last one uses libpq_pipeline -t and verifies the output against an expected
> trace file.  Applies on top of all the previous patches.  I attach the whole lot,
> so that the CF bot has a chance to run it.

Thank you for polishing the patch.

Iwata-san,
Please review Alvaro-san's code, and I think you can integrate all patches into one except for 0002 and 0007.  Those
twopatches may be separate or merged into one as a test patch.
 


> I did notice another problem for comparison of expected trace files, which is
> that RowDescription includes table OIDs for some columns.  I think we would
> need to have a flag to suppress that too, somehow, although the answer to what
> should we do is not as clear as for the other two cases.

I'm afraid this may render the test comparison almost impossible.  Tests that access system catalogs and large objects
probablyoutput OIDs.
 



Regards
Takayuki Tsunakawa



Re: libpq debug log

From
"'alvherre@alvh.no-ip.org'"
Date:
On 2021-Mar-29, tsunakawa.takay@fujitsu.com wrote:

> > (Hey, what the heck is that "Z" at the end of the message?  I thought
> > the error ended right at the \x00 ...)
> 
> We'll investigate these issues.

For what it's worth, I did fix this problem in patch 0005 that I
attached.  The problem was that one "continue" should have been "break",
and also a "while ( .. )" needed to be made an infinite loop.  It was
easy to catch these problems once I added (in 0006) the check that the
bytes consumed equal message length, as I had suggested a couple of
weeks ago :-)  I also changed the code for Notice, but I didn't actually
verify that one.

> > 2. The < and > characters are not good for visual inspection.  I
> > replaced them with F and B and I think it's much clearer.  Compare:
> > I think the second one is much easier on the eye.
> 
> Yes, agreed.  I too thought of something like "C->S" and "S->C"
> because client and server should be more familiar for users than
> frontend and backend.

Hmm, yeah, that's a reasonable option too.  What do others think?

-- 
Álvaro Herrera       Valdivia, Chile



Re: libpq debug log

From
Kyotaro Horiguchi
Date:
At Mon, 29 Mar 2021 00:02:58 -0300, "'alvherre@alvh.no-ip.org'" <alvherre@alvh.no-ip.org> wrote in 
> On 2021-Mar-29, tsunakawa.takay@fujitsu.com wrote:
> 
> > > (Hey, what the heck is that "Z" at the end of the message?  I thought
> > > the error ended right at the \x00 ...)
> > 
> > We'll investigate these issues.
> 
> For what it's worth, I did fix this problem in patch 0005 that I
> attached.  The problem was that one "continue" should have been "break",
> and also a "while ( .. )" needed to be made an infinite loop.  It was
> easy to catch these problems once I added (in 0006) the check that the
> bytes consumed equal message length, as I had suggested a couple of
> weeks ago :-)  I also changed the code for Notice, but I didn't actually
> verify that one.
> 
> > > 2. The < and > characters are not good for visual inspection.  I
> > > replaced them with F and B and I think it's much clearer.  Compare:
> > > I think the second one is much easier on the eye.
> > 
> > Yes, agreed.  I too thought of something like "C->S" and "S->C"
> > because client and server should be more familiar for users than
> > frontend and backend.
> 
> Hmm, yeah, that's a reasonable option too.  What do others think?

It's better to be short as far as it is clear enough. Actually '<' to
'F' and '>' to 'B' is clear enough to me. So I don't need a longer
notation.  O(ut) and (I)n also makes sense to me.  Rather, "C->S", and
"S->C" are a little difficult to understand at a glance

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



RE: libpq debug log

From
"tsunakawa.takay@fujitsu.com"
Date:
From: Kyotaro Horiguchi <horikyota.ntt@gmail.com>
> It's better to be short as far as it is clear enough. Actually '<' to
> 'F' and '>' to 'B' is clear enough to me. So I don't need a longer
> notation.

Agreed, because the message format description in the PG manual uses F and B.


Regards
Takayuki Tsunakawa





RE: libpq debug log

From
"tsunakawa.takay@fujitsu.com"
Date:
From: 'alvherre@alvh.no-ip.org' <alvherre@alvh.no-ip.org>
> > > (Hey, what the heck is that "Z" at the end of the message?  I thought
> > > the error ended right at the \x00 ...)
> >
> > We'll investigate these issues.
> 
> For what it's worth, I did fix this problem in patch 0005 that I
> attached.  The problem was that one "continue" should have been "break",
> and also a "while ( .. )" needed to be made an infinite loop.  It was
> easy to catch these problems once I added (in 0006) the check that the
> bytes consumed equal message length, as I had suggested a couple of
> weeks ago :-)  I also changed the code for Notice, but I didn't actually
> verify that one.

You already fix the issue, didn't you?  Thank you.  Your suggestion of checking the length has surely proved to be
correct.


Regards
Takayuki Tsunakawa



RE: libpq debug log

From
"iwata.aya@fujitsu.com"
Date:
Hi Alvaro san, Tsunakawa san

Thank you for creating the v30 patch.

> From: Tsunakawa, Takayuki/綱川 貴之 <tsunakawa.takay@fujitsu.com>
> Sent: Monday, March 29, 2021 9:45 AM
...
> Iwata-san,
> Please review Alvaro-san's code, and I think you can integrate all patches into
> one except for 0002 and 0007.  Those two patches may be separate or
> merged into one as a test patch.

I reviewed v30 patches. I think it was good except that the documentation about the direction of the message was not
changing.
I also tried the v30 patch using regression test and it worked fine.
I merged v30 patches and update the patch to v31.

This new version patch includes the fix of libpq.smgl fix and the addition of regression test mode.
In libpq.smgl, the symbol indicating the message direction has been corrected from "<" ">" to "B" "F" in the
documentation.

> From: alvherre@alvh.no-ip.org <alvherre@alvh.no-ip.org>
> Sent: Sunday, March 28, 2021 4:28 AM
...
> Maybe the easiest way is to have a new flag PQTRACE_REGRESS_MODE.

To prepare for regression test, I read Message Formats documentation.
https://www.postgresql.org/docs/current/protocol-message-formats.html

Following protocol messages have values that depend on the code of master at that time;
- BackendKeyData(B) ... includes backend PID and backend private key
- ErrorResponse(B) ... includes error message line number
- FunctionCall(F) ... includes function OID
- NoticeResponse(B) ... includes notice message line number
- NotificationResponse (B) ... includes backend PID
- ParameterDescription  ... includes parameter OID
- Parse(F) ... includes parameter data type OID
- RowDescription(B) ... includes OIDs

I checked status of conn->pqTraceFlags to decide whether output version-dependent messages or not in above protocol
messageoutput functions.
 
In ErrorResponse and NoticeResponse, I skip string type message logging only when field type code is "L".
In my understanding, other field code message type does not depend on version.
So I didn't skip other code type's string messages.
And I also changed description of pqTraceSetFlags()
by changing PQTRACE_SUPPRESS_TIMESTAMPS flag to the PQTRACE_REGRESS_MODE flag.

Output of regress mode is following;

B   124 ErrorResponse    S "ERROR" V "ERROR" C "22023" M "unrecognized parameter "some_nonexistent_parameter"" F
"reloptions.c"L R "parseRelOptionsInternal" \x00
 
B   14  ParameterDescription     2   

Output of non-regress mode is following;

2021-03-30 12:55:31.327913  B   124 ErrorResponse    S "ERROR" V "ERROR" C "22023" M "unrecognized parameter
"some_nonexistent_parameter""F "reloptions.c" L "1447" R "parseRelOptionsInternal" \x00
 
2021-03-30 12:56:12.691617  B   14  ParameterDescription     2 25 701

Regards,
Aya Iwata

Attachment

Re: libpq debug log

From
"'alvherre@alvh.no-ip.org'"
Date:
Okay, pushed this patch and the new testing for it based on
libpq_pipeline.  We'll see how the buildfarm likes it.

I made some further changes to the last version; user-visibly, I split
the trace flags in two, keeping the timestamp suppression separate from
the redacting feature for regression testing.

I didn't like the idea of silently skipping the redacted fields, so I
changed the code to print NNNN or SSSS instead.  I also made the
redacting occur in the last mile (pqTraceOutputInt32 / String) rather
that in their callers: it seemed quite odd to advance the cursor in the
"else" branch.

I refactored the duplicate code that appeared for Notice and Error.
In that function, we redact not only the 'L' field (what Iwata-san was
doing) but also 'F' (file) and 'R' (routine) because those things can
move around for reasons that are not interesting to testing this code.

In the libpq_pipeline commit I added 'pipeline_abort' and 'transaction'
to the cases that generate traces, which adds coverage for
NoticeResponse and ErrorResponse.

-- 
Álvaro Herrera                            39°49'30"S 73°17'W



Re: libpq debug log

From
"'alvherre@alvh.no-ip.org'"
Date:
So crake failed.  The failure is that it doesn't print the DataRow
messages.  That's quite odd.  We see this in the trace log:

Mar 30 20:05:15 # F    54    Parse     "" "SELECT 1.0/g FROM generate_series(3, -1, -1) g" 0
Mar 30 20:05:15 # F    12    Bind     "" "" 0 0 0
Mar 30 20:05:15 # F    6    Describe     P ""
Mar 30 20:05:15 # F    9    Execute     "" 0
Mar 30 20:05:15 # F    4    Sync
Mar 30 20:05:15 # B    4    ParseComplete
Mar 30 20:05:15 # B    4    BindComplete
Mar 30 20:05:15 # B    33    RowDescription     1 "?column?" NNNN 0 NNNN 65535 -1 0
Mar 30 20:05:15 # B    70    ErrorResponse     S "ERROR" V "ERROR" C "22012" M "division by zero" F "SSSS" L "SSSS" R
"SSSS"\\x00
 
Mar 30 20:05:15 # B    5    ReadyForQuery     I

and the expected is this:

Mar 30 20:05:15 # F    54    Parse     "" "SELECT 1.0/g FROM generate_series(3, -1, -1) g" 0
Mar 30 20:05:15 # F    12    Bind     "" "" 0 0 0
Mar 30 20:05:15 # F    6    Describe     P ""
Mar 30 20:05:15 # F    9    Execute     "" 0
Mar 30 20:05:15 # F    4    Sync
Mar 30 20:05:15 # B    4    ParseComplete
Mar 30 20:05:15 # B    4    BindComplete
Mar 30 20:05:15 # B    33    RowDescription     1 "?column?" NNNN 0 NNNN 65535 -1 0
Mar 30 20:05:15 # B    32    DataRow     1 22 '0.33333333333333333333'
Mar 30 20:05:15 # B    32    DataRow     1 22 '0.50000000000000000000'
Mar 30 20:05:15 # B    32    DataRow     1 22 '1.00000000000000000000'
Mar 30 20:05:15 # B    70    ErrorResponse     S "ERROR" V "ERROR" C "22012" M "division by zero" F "SSSS" L "SSSS" R
"SSSS"\\x00
 
Mar 30 20:05:15 # B    5    ReadyForQuery     I


...

I wonder if this is a libpq pipeline problem rather than a PQtrace
problem.  In that test case we're using the libpq singlerow mode, so we
should see three rows before the error is sent, but for some reason
crake is not doing that.

aborted pipeline... NOTICE:  table "pq_pipeline_demo" does not exist, skipping
ok
got expected ERROR:  cannot insert multiple commands into a prepared statement
got expected division-by-zero
ok 5 - libpq_pipeline pipeline_abort
not ok 6 - pipeline_abort trace match


Other hosts seem to get it right:

# Running: libpq_pipeline -t
/Users/buildfarm/bf-data/HEAD/pgsql.build/src/test/modules/libpq_pipeline/tmp_check/log/pipeline_abort.trace
pipeline_abortport=49797 host=/var/folders/md/8mp8j5m5169ccgy11plb4w_80000gp/T/iA9YP9_IHa dbname='postgres' 10000
 
aborted pipeline... NOTICE:  table "pq_pipeline_demo" does not exist, skipping
ok
got expected ERROR:  cannot insert multiple commands into a prepared statement
got row: 0.33333333333333333333
got row: 0.50000000000000000000
got row: 1.00000000000000000000
got expected division-by-zero
ok 5 - libpq_pipeline pipeline_abort
ok 6 - pipeline_abort trace match


-- 
Álvaro Herrera       Valdivia, Chile
"This is what I like so much about PostgreSQL.  Most of the surprises
are of the "oh wow!  That's cool" Not the "oh shit!" kind.  :)"
Scott Marlowe, http://archives.postgresql.org/pgsql-admin/2008-10/msg00152.php



RE: libpq debug log

From
"tsunakawa.takay@fujitsu.com"
Date:
From: 'alvherre@alvh.no-ip.org' <alvherre@alvh.no-ip.org>
> Okay, pushed this patch and the new testing for it based on
> libpq_pipeline.  We'll see how the buildfarm likes it.

Thank you very much!  I appreciate you taking your valuable time while I imagine you must be pretty busy with taking
careof other (possibly more important) patches.
 

TBH, when Tom-san suggested drastic change, I was afraid we may not be able to complete this in PG 14.  But in the end,
I'mvery happy that the patch has become much leaner and cleaner.
 

And congratulations on your first commit, Iwata-san!  I hope you can have time and energy to try adding a connection
parameterto enable tracing, which eliminates application modification.
 


> I didn't like the idea of silently skipping the redacted fields, so I
> changed the code to print NNNN or SSSS instead.  I also made the
> redacting occur in the last mile (pqTraceOutputInt32 / String) rather
> that in their callers: it seemed quite odd to advance the cursor in the
> "else" branch.
> 
> I refactored the duplicate code that appeared for Notice and Error.
> In that function, we redact not only the 'L' field (what Iwata-san was
> doing) but also 'F' (file) and 'R' (routine) because those things can
> move around for reasons that are not interesting to testing this code.
> 
> In the libpq_pipeline commit I added 'pipeline_abort' and 'transaction'
> to the cases that generate traces, which adds coverage for
> NoticeResponse and ErrorResponse.

These make sense to me.  Thank you for repeatedly polishing and making the patch better much.



Regards
Takayuki Tsunakawa



Re: libpq debug log

From
"'alvherre@alvh.no-ip.org'"
Date:
On 2021-Mar-30, 'alvherre@alvh.no-ip.org' wrote:

> got expected ERROR:  cannot insert multiple commands into a prepared statement
> got expected division-by-zero

Eh?  this is not at all expected, of course, but clearly not PQtrace's
fault.  I'll look tomorrow.

-- 
Álvaro Herrera                            39°49'30"S 73°17'W
"El sentido de las cosas no viene de las cosas, sino de
las inteligencias que las aplican a sus problemas diarios
en busca del progreso." (Ernesto Hernández-Novich)



RE: libpq debug log

From
"tsunakawa.takay@fujitsu.com"
Date:
From: 'alvherre@alvh.no-ip.org' <alvherre@alvh.no-ip.org>
> > got expected ERROR:  cannot insert multiple commands into a prepared
> statement
> > got expected division-by-zero
> 
> Eh?  this is not at all expected, of course, but clearly not PQtrace's
> fault.  I'll look tomorrow.

Iwata-san and I were starting to investigate the issue.  I guessed the first message was not expected.  Please let us
knowif there's something we can help.
 

(I was amazed that you finally committed this great feature, libpq pipeline, while you are caring about many patches.)


Regards
Takayuki Tsunakawa



Re: libpq debug log

From
Tom Lane
Date:
"'alvherre@alvh.no-ip.org'" <alvherre@alvh.no-ip.org> writes:
> So crake failed.  The failure is that it doesn't print the DataRow
> messages.  That's quite odd.  We see this in the trace log:

I think this is a timing problem that's triggered (on some machines)
by force_parallel_mode = regress.  Looking at spurfowl's latest
failure of this type, the postmaster log shows

2021-03-31 14:34:54.982 EDT [18233:15] 001_libpq_pipeline.pl LOG:  execute <unnamed>: SELECT 1.0/g FROM
generate_series(3,-1, -1) g 
2021-03-31 14:34:54.992 EDT [18234:1] ERROR:  division by zero
2021-03-31 14:34:54.992 EDT [18234:2] STATEMENT:  SELECT 1.0/g FROM generate_series(3, -1, -1) g
2021-03-31 14:34:54.993 EDT [18233:16] 001_libpq_pipeline.pl ERROR:  division by zero
2021-03-31 14:34:54.993 EDT [18233:17] 001_libpq_pipeline.pl STATEMENT:  SELECT 1.0/g FROM generate_series(3, -1, -1) g
2021-03-31 14:34:54.995 EDT [18216:4] LOG:  background worker "parallel worker" (PID 18234) exited with exit code 1
2021-03-31 14:34:54.995 EDT [18233:18] 001_libpq_pipeline.pl LOG:  could not send data to client: Broken pipe
2021-03-31 14:34:54.995 EDT [18233:19] 001_libpq_pipeline.pl FATAL:  connection to client lost

We can see that the division by zero occurred in a parallel worker.
My theory is that in parallel mode, it's uncertain whether the
error will be reported before or after the "preceding" successful
output rows.  So you need to disable parallelism to make this
test case stable.

            regards, tom lane



Re: libpq debug log

From
"'alvherre@alvh.no-ip.org'"
Date:
On 2021-Mar-31, Tom Lane wrote:

> I think this is a timing problem that's triggered (on some machines)
> by force_parallel_mode = regress.  Looking at spurfowl's latest
> failure of this type, the postmaster log shows
> 
> 2021-03-31 14:34:54.982 EDT [18233:15] 001_libpq_pipeline.pl LOG:  execute <unnamed>: SELECT 1.0/g FROM
generate_series(3,-1, -1) g
 
> 2021-03-31 14:34:54.992 EDT [18234:1] ERROR:  division by zero
> 2021-03-31 14:34:54.992 EDT [18234:2] STATEMENT:  SELECT 1.0/g FROM generate_series(3, -1, -1) g
> 2021-03-31 14:34:54.993 EDT [18233:16] 001_libpq_pipeline.pl ERROR:  division by zero
> 2021-03-31 14:34:54.993 EDT [18233:17] 001_libpq_pipeline.pl STATEMENT:  SELECT 1.0/g FROM generate_series(3, -1, -1)
g
> 2021-03-31 14:34:54.995 EDT [18216:4] LOG:  background worker "parallel worker" (PID 18234) exited with exit code 1
> 2021-03-31 14:34:54.995 EDT [18233:18] 001_libpq_pipeline.pl LOG:  could not send data to client: Broken pipe
> 2021-03-31 14:34:54.995 EDT [18233:19] 001_libpq_pipeline.pl FATAL:  connection to client lost
> 
> We can see that the division by zero occurred in a parallel worker.
> My theory is that in parallel mode, it's uncertain whether the
> error will be reported before or after the "preceding" successful
> output rows.  So you need to disable parallelism to make this
> test case stable.

Ooh, that makes sense, thanks.  Added a SET in the program itself to set
it to off.

This is not the *only* issue though; at least animal drongo shows a
completely different failure, where the last few tests don't even get to
run, and the server log just has this:

2021-03-31 20:20:14.460 UTC [178364:4] 001_libpq_pipeline.pl LOG:  disconnection: session time: 0:00:00.469
user=pgrunnerdatabase=postgres host=127.0.0.1 port=52638
 
2021-03-31 20:20:14.681 UTC [178696:1] [unknown] LOG:  connection received: host=127.0.0.1 port=52639
2021-03-31 20:20:14.687 UTC [178696:2] [unknown] LOG:  connection authorized: user=pgrunner database=postgres
application_name=001_libpq_pipeline.pl
2021-03-31 20:20:15.157 UTC [178696:3] 001_libpq_pipeline.pl LOG:  could not receive data from client: An existing
connectionwas forcibly closed by the remote host.
 

I suppose the program is crashing for some reason.

-- 
Álvaro Herrera       Valdivia, Chile
"Los dioses no protegen a los insensatos.  Éstos reciben protección de
otros insensatos mejor dotados" (Luis Wu, Mundo Anillo)



Re: libpq debug log

From
Tom Lane
Date:
"'alvherre@alvh.no-ip.org'" <alvherre@alvh.no-ip.org> writes:
> This is not the *only* issue though; at least animal drongo shows a
> completely different failure, where the last few tests don't even get to
> run, and the server log just has this:

That is weird - only test 4 (of 8) runs at all, the rest seem to
fail to connect.  What's different about pipelined_insert?

            regards, tom lane



Re: libpq debug log

From
Tom Lane
Date:
I wrote:
> That is weird - only test 4 (of 8) runs at all, the rest seem to
> fail to connect.  What's different about pipelined_insert?

Oh ... there's a pretty obvious theory.  pipelined_insert is
the only one that is not asked to write a trace file.
So for some reason, opening the trace file fails.
(I wonder why we don't see an error message for this though.)

            regards, tom lane



Re: libpq debug log

From
"'alvherre@alvh.no-ip.org'"
Date:
On 2021-Mar-31, Tom Lane wrote:

> I wrote:
> > That is weird - only test 4 (of 8) runs at all, the rest seem to
> > fail to connect.  What's different about pipelined_insert?
> 
> Oh ... there's a pretty obvious theory.  pipelined_insert is
> the only one that is not asked to write a trace file.
> So for some reason, opening the trace file fails.
> (I wonder why we don't see an error message for this though.)

.. oh, I think we forgot to set conn->Pfdebug = NULL when creating the
connection.  So when we do PQtrace(), the first thing it does is
PQuntrace(), and then that tries to do fflush(conn->Pfdebug) ---> crash.
So this should fix it.

-- 
Álvaro Herrera       Valdivia, Chile
<inflex> really, I see PHP as like a strange amalgamation of C, Perl, Shell
<crab> inflex: you know that "amalgam" means "mixture with mercury",
       more or less, right?
<crab> i.e., "deadly poison"

Attachment

Re: libpq debug log

From
"'alvherre@alvh.no-ip.org'"
Date:
On 2021-Mar-31, 'alvherre@alvh.no-ip.org' wrote:

> .. oh, I think we forgot to set conn->Pfdebug = NULL when creating the
> connection.  So when we do PQtrace(), the first thing it does is
> PQuntrace(), and then that tries to do fflush(conn->Pfdebug) ---> crash.
> So this should fix it.

I tried to use MALLOC_PERTURB_ to prove this theory, but I failed at it.
I'll just push this blind and see what happens.

-- 
Álvaro Herrera                            39°49'30"S 73°17'W
"Someone said that it is at least an order of magnitude more work to do
production software than a prototype. I think he is wrong by at least
an order of magnitude."                              (Brian Kernighan)



Re: libpq debug log

From
Tom Lane
Date:
"'alvherre@alvh.no-ip.org'" <alvherre@alvh.no-ip.org> writes:
> On 2021-Mar-31, Tom Lane wrote:
>> So for some reason, opening the trace file fails.
>> (I wonder why we don't see an error message for this though.)

> .. oh, I think we forgot to set conn->Pfdebug = NULL when creating the
> connection.  So when we do PQtrace(), the first thing it does is
> PQuntrace(), and then that tries to do fflush(conn->Pfdebug) ---> crash.
> So this should fix it.

Nope, see the MemSet a few lines further up.  This change seems like
good style, but it won't fix anything --- else we'd have been
having issues with the pre-existing trace logic.

What I suspect is some Windows dependency in the way that
001_libpq_pipeline.pl is setting up the trace output files.
cc'ing Andrew to see if he has any ideas.

            regards, tom lane



Re: libpq debug log

From
Tom Lane
Date:
I wrote:
> What I suspect is some Windows dependency in the way that
> 001_libpq_pipeline.pl is setting up the trace output files.

While this may have little to do with drongo's issue,
I'm going to take exception to this bit that I see that
the patch added to PQtrace():

    /* Make the trace stream line-buffered */
    setvbuf(debug_port, NULL, _IOLBF, 0);

I do not like that on two grounds:

1. The trace file handle belongs to the calling application,
not to libpq.  I do not think libpq has any business editorializing
on the file's settings.

2. POSIX says that setvbuf() must be done before any other
operation on the file.  Since we have no way to know whether
any output has already been written to the trace file, the
above is a spec violation.


... and as for an actual explanation for drongo's issue,
I'm sort of wondering why libpq_pipeline.c is opening the
trace file in "w+" mode rather than vanilla "w" mode.
That seems unnecessary, and maybe it doesn't work so well
on Windows.

            regards, tom lane



Re: libpq debug log

From
"'alvherre@alvh.no-ip.org'"
Date:
On 2021-Mar-31, Tom Lane wrote:

> While this may have little to do with drongo's issue,
> I'm going to take exception to this bit that I see that
> the patch added to PQtrace():
> 
>     /* Make the trace stream line-buffered */
>     setvbuf(debug_port, NULL, _IOLBF, 0);

Mea culpa.  I added this early on because it made PQtrace's use more
comfortable, but I agree that it's a mistake.  Removed.  I put it in
libpq_pipeline.c instead.

> ... and as for an actual explanation for drongo's issue,
> I'm sort of wondering why libpq_pipeline.c is opening the
> trace file in "w+" mode rather than vanilla "w" mode.
> That seems unnecessary, and maybe it doesn't work so well
> on Windows.

well, at least the MSVC documentation claims that it works, but who
knows.  I removed that too.
https://docs.microsoft.com/en-us/cpp/c-runtime-library/reference/fopen-wfopen?view=msvc-160

-- 
Álvaro Herrera       Valdivia, Chile



Re: libpq debug log

From
"'alvherre@alvh.no-ip.org'"
Date:
On 2021-Mar-31, Tom Lane wrote:

> I wrote:
> > That is weird - only test 4 (of 8) runs at all, the rest seem to
> > fail to connect.  What's different about pipelined_insert?
> 
> Oh ... there's a pretty obvious theory.  pipelined_insert is
> the only one that is not asked to write a trace file.
> So for some reason, opening the trace file fails.
> (I wonder why we don't see an error message for this though.)

Eh, so I forgot to strdup(optarg[optind]).  Apparently that works fine
in glibc but other getopt implementations are likely not so friendly.

-- 
Álvaro Herrera                            39°49'30"S 73°17'W
"I am amazed at [the pgsql-sql] mailing list for the wonderful support, and
lack of hesitasion in answering a lost soul's question, I just wished the rest
of the mailing list could be like this."                               (Fotis)
               (http://archives.postgresql.org/pgsql-sql/2006-06/msg00265.php)



Re: libpq debug log

From
Tom Lane
Date:
"'alvherre@alvh.no-ip.org'" <alvherre@alvh.no-ip.org> writes:
> Eh, so I forgot to strdup(optarg[optind]).  Apparently that works fine
> in glibc but other getopt implementations are likely not so friendly.

Hmm ... interesting theory, but I don't think I buy it, since the
program isn't doing anything that should damage argv[].  I guess
we'll soon find out.

            regards, tom lane



Re: libpq debug log

From
Tom Lane
Date:
BTW, what in the world is this supposed to accomplish?

-                        (long long) rows_to_send);
+                        (1L << 62) + (long long) rows_to_send);

Various buildfarm members are complaining that the shift distance
is more than the width of "long", which I'd just fix with s/L/LL/
except that the addition of the constant seems just wrong to
begin with.

            regards, tom lane



Re: libpq debug log

From
"'alvherre@alvh.no-ip.org'"
Date:
On 2021-Apr-01, Tom Lane wrote:

> BTW, what in the world is this supposed to accomplish?
> 
> -                        (long long) rows_to_send);
> +                        (1L << 62) + (long long) rows_to_send);
> 
> Various buildfarm members are complaining that the shift distance
> is more than the width of "long", which I'd just fix with s/L/LL/
> except that the addition of the constant seems just wrong to
> begin with.

It makes the text representation wider, which means some buffer fills up
faster and the program switches from sending to receiving.

-- 
Álvaro Herrera       Valdivia, Chile
"Ni aún el genio muy grande llegaría muy lejos
si tuviera que sacarlo todo de su propio interior" (Goethe)



Re: libpq debug log

From
Tom Lane
Date:
"'alvherre@alvh.no-ip.org'" <alvherre@alvh.no-ip.org> writes:
> On 2021-Apr-01, Tom Lane wrote:
>> BTW, what in the world is this supposed to accomplish?
>> -                        (long long) rows_to_send);
>> +                        (1L << 62) + (long long) rows_to_send);

> It makes the text representation wider, which means some buffer fills up
> faster and the program switches from sending to receiving.

Hm.  If the actual field value isn't relevant, why bother including
rows_to_send in it?  A constant string would be simpler and much
less confusing as to its purpose.

            regards, tom lane



Re: libpq debug log

From
Tom Lane
Date:
So drongo is still failing, and after a bit of looking around at
other code I finally got hit with the clue hammer.  Per port.h:

 * On Windows, setvbuf() does not support _IOLBF mode, and interprets that
 * as _IOFBF.  To add insult to injury, setvbuf(file, NULL, _IOFBF, 0)
 * crashes outright if "parameter validation" is enabled.  Therefore, in
 * places where we'd like to select line-buffered mode, we fall back to
 * unbuffered mode instead on Windows.  Always use PG_IOLBF not _IOLBF
 * directly in order to implement this behavior.

You want to do the honors?  And do something about that shift bug
while at it.

            regards, tom lane



Re: libpq debug log

From
"'alvherre@alvh.no-ip.org'"
Date:
On 2021-Apr-01, Tom Lane wrote:

> "'alvherre@alvh.no-ip.org'" <alvherre@alvh.no-ip.org> writes:
> > On 2021-Apr-01, Tom Lane wrote:
> >> BTW, what in the world is this supposed to accomplish?
> >> -                        (long long) rows_to_send);
> >> +                        (1L << 62) + (long long) rows_to_send);
> 
> > It makes the text representation wider, which means some buffer fills up
> > faster and the program switches from sending to receiving.
> 
> Hm.  If the actual field value isn't relevant, why bother including
> rows_to_send in it?  A constant string would be simpler and much
> less confusing as to its purpose.

Hah, yeah, we could as well do that I guess :-)

-- 
Álvaro Herrera       Valdivia, Chile
"I think my standards have lowered enough that now I think 'good design'
is when the page doesn't irritate the living f*ck out of me." (JWZ)



Re: libpq debug log

From
"'alvherre@alvh.no-ip.org'"
Date:
On 2021-Apr-01, Tom Lane wrote:

> So drongo is still failing, and after a bit of looking around at
> other code I finally got hit with the clue hammer.  Per port.h:
> 
>  * On Windows, setvbuf() does not support _IOLBF mode, and interprets that
>  * as _IOFBF.  To add insult to injury, setvbuf(file, NULL, _IOFBF, 0)
>  * crashes outright if "parameter validation" is enabled.  Therefore, in
>  * places where we'd like to select line-buffered mode, we fall back to
>  * unbuffered mode instead on Windows.  Always use PG_IOLBF not _IOLBF
>  * directly in order to implement this behavior.
> 
> You want to do the honors?  And do something about that shift bug
> while at it.

Ooh, wow ... now that is a silly bug!  Thanks, I'll push the fix in a
minute.

Andrew also noted that the use of command_ok() in the test file eats all
the output and is what is preventing us from seeing it in the first.
I'll try to get rid of that together with the rest.

-- 
Álvaro Herrera       Valdivia, Chile



Re: libpq debug log

From
"'alvherre@alvh.no-ip.org'"
Date:
On 2021-Apr-01, 'alvherre@alvh.no-ip.org' wrote:

> Ooh, wow ... now that is a silly bug!  Thanks, I'll push the fix in a
> minute.

It still didn't fix it!  Drongo is now reporting a difference in the
expected trace -- and the differences all seem to be message lengths.
Now that is pretty mysterious, because the messages themselves are
printed identically.  Perl's output is pretty unhelpful, but I wrote them to a
file and diffed manually; it's attached.

So for example when we expect
! # B    123    ErrorResponse     S "ERROR" V "ERROR" C "42601" M "cannot insert multiple commands into a prepared
statement"F "SSSS" L "SSSS" R "SSSS" \\x00
 

we actually get
! # B    173    ErrorResponse     S "ERROR" V "ERROR" C "42601" M "cannot insert multiple commands into a prepared
statement"F "SSSS" L "SSSS" R "SSSS" \\x00
 

[slaps forehead] I suppose the difference must be that the message
length includes the redacted string fields.  So the file in the F file
is 50 chars longer, *kaboom*.  (I'm actually very surprised that this
didn't blow up earlier.)

I'm not sure what to do about this. Maybe the message length for
ErrorResponse/NoticeResponse ought to be redacted too.

-- 
Álvaro Herrera                            39°49'30"S 73°17'W



RE: libpq debug log

From
"iwata.aya@fujitsu.com"
Date:
Hi Alvaro san

Thank you for your fix of trace log code.

> From: 'alvherre@alvh.no-ip.org' <alvherre@alvh.no-ip.org>
> Sent: Friday, April 2, 2021 11:30 AM
...
> It still didn't fix it!  Drongo is now reporting a difference in the expected trace --
> and the differences all seem to be message lengths.
> Now that is pretty mysterious, because the messages themselves are printed
> identically.  Perl's output is pretty unhelpful, but I wrote them to a file and diffed
> manually; it's attached.

Do ErrorResponse and NoticeResponse vary from test to test ...?
If so, it seemed difficult to make the trace logs available for regression test. 
I will also investigate the cause of this issue.

Regards,
Aya Iwata

Re: libpq debug log

From
Kyotaro Horiguchi
Date:
At Fri, 2 Apr 2021 02:56:44 +0000, "iwata.aya@fujitsu.com" <iwata.aya@fujitsu.com> wrote in 
> Hi Alvaro san
> 
> Thank you for your fix of trace log code.
> 
> > From: 'alvherre@alvh.no-ip.org' <alvherre@alvh.no-ip.org>
> > Sent: Friday, April 2, 2021 11:30 AM
> ...
> > It still didn't fix it!  Drongo is now reporting a difference in the expected trace --
> > and the differences all seem to be message lengths.
> > Now that is pretty mysterious, because the messages themselves are printed
> > identically.  Perl's output is pretty unhelpful, but I wrote them to a file and diffed
> > manually; it's attached.
>
> Do ErrorResponse and NoticeResponse vary from test to test ...?
> If so, it seemed difficult to make the trace logs available for regression test. 
> I will also investigate the cause of this issue.

The redacted fields, F, L and R contained source file, souce line and
source function respectively. It is reasonable guess that the
difference comes from them but I'm not sure how they make a difference
of 50 bytes in length...

Anyway if the length is wrong, we should get an error after emitting
the log line.

>    if (logCursor - 1 != length)
>        fprintf(conn->Pfdebug,
>                "mismatched message length: consumed %d, expected %d\n",
>                logCursor - 1, length);

So, the cheapest measure for regression test would be just making the
length field, while regress is true...

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



Re: libpq debug log

From
Tom Lane
Date:
Kyotaro Horiguchi <horikyota.ntt@gmail.com> writes:
> At Fri, 2 Apr 2021 02:56:44 +0000, "iwata.aya@fujitsu.com" <iwata.aya@fujitsu.com> wrote in
>> Do ErrorResponse and NoticeResponse vary from test to test ...?
>> If so, it seemed difficult to make the trace logs available for regression test.
>> I will also investigate the cause of this issue.

> The redacted fields, F, L and R contained source file, souce line and
> source function respectively. It is reasonable guess that the
> difference comes from them but I'm not sure how they make a difference
> of 50 bytes in length...

Some compilers include the full path of the source file in F ...

            regards, tom lane



Re: libpq debug log

From
Kyotaro Horiguchi
Date:
At Fri, 02 Apr 2021 14:40:09 +0900 (JST), Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote in 
> At Fri, 2 Apr 2021 02:56:44 +0000, "iwata.aya@fujitsu.com" <iwata.aya@fujitsu.com> wrote in 
> > Hi Alvaro san
> > 
> > Thank you for your fix of trace log code.
> > 
> > > From: 'alvherre@alvh.no-ip.org' <alvherre@alvh.no-ip.org>
> > > Sent: Friday, April 2, 2021 11:30 AM
> > ...
> > > It still didn't fix it!  Drongo is now reporting a difference in the expected trace --
> > > and the differences all seem to be message lengths.
> > > Now that is pretty mysterious, because the messages themselves are printed
> > > identically.  Perl's output is pretty unhelpful, but I wrote them to a file and diffed
> > > manually; it's attached.
> >
> > Do ErrorResponse and NoticeResponse vary from test to test ...?
> > If so, it seemed difficult to make the trace logs available for regression test. 
> > I will also investigate the cause of this issue.
> 
> The redacted fields, F, L and R contained source file, souce line and
> source function respectively. It is reasonable guess that the
> difference comes from them but I'm not sure how they make a difference
> of 50 bytes in length...
> 
> Anyway if the length is wrong, we should get an error after emitting
> the log line.
> 
> >    if (logCursor - 1 != length)
> >        fprintf(conn->Pfdebug,
> >                "mismatched message length: consumed %d, expected %d\n",
> >                logCursor - 1, length);
> 
> So, the cheapest measure for regression test would be just making the

So, the cheapest measure for regression test would be just *masking* the

> length field, while regress is true...

tired?

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



Re: libpq debug log

From
Kyotaro Horiguchi
Date:
At Fri, 02 Apr 2021 01:45:06 -0400, Tom Lane <tgl@sss.pgh.pa.us> wrote in 
> Kyotaro Horiguchi <horikyota.ntt@gmail.com> writes:
> > At Fri, 2 Apr 2021 02:56:44 +0000, "iwata.aya@fujitsu.com" <iwata.aya@fujitsu.com> wrote in 
> >> Do ErrorResponse and NoticeResponse vary from test to test ...?
> >> If so, it seemed difficult to make the trace logs available for regression test. 
> >> I will also investigate the cause of this issue.
> 
> > The redacted fields, F, L and R contained source file, souce line and
> > source function respectively. It is reasonable guess that the
> > difference comes from them but I'm not sure how they make a difference
> > of 50 bytes in length...
> 
> Some compilers include the full path of the source file in F ...

Ah. I suspected that but dismissed the possibility looking that on my
environment, gcc8.  That completely makes sense. Thank you!

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



Re: libpq debug log

From
Alvaro Herrera
Date:
On 2021-Apr-02, Kyotaro Horiguchi wrote:

> At Fri, 02 Apr 2021 14:40:09 +0900 (JST), Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote in 

> > So, the cheapest measure for regression test would be just making the
> 
> So, the cheapest measure for regression test would be just *masking* the
> 
> > length field, while regress is true...

Yeah.

> tired?

Same here.

-- 
Álvaro Herrera                            39°49'30"S 73°17'W
"Thou shalt not follow the NULL pointer, for chaos and madness await
thee at its end." (2nd Commandment for C programmers)



Re: libpq debug log

From
Alvaro Herrera
Date:
On 2021-Apr-02, Alvaro Herrera wrote:

> On 2021-Apr-02, Kyotaro Horiguchi wrote:
> 
> > At Fri, 02 Apr 2021 14:40:09 +0900 (JST), Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote in 
> 
> > > So, the cheapest measure for regression test would be just making the
> > 
> > So, the cheapest measure for regression test would be just *masking* the
> > 
> > > length field, while regress is true...
> 
> Yeah.

As in the attached patch.

-- 
Álvaro Herrera       Valdivia, Chile
"Investigación es lo que hago cuando no sé lo que estoy haciendo"
(Wernher von Braun)

Attachment

Re: libpq debug log

From
Tom Lane
Date:
Alvaro Herrera <alvherre@alvh.no-ip.org> writes:
> As in the attached patch.

+1, but the comment could be more specific.  Maybe like "In regress mode,
suppress the length of ErrorResponse and NoticeResponse messages --- the F
(file name) field, in particular, can vary in length depending on compiler
used."

Actually though ... I recall now that elog.c tries to standardize the F
output by stripping the path part:

        /* keep only base name, useful especially for vpath builds */
        slash = strrchr(filename, '/');
        if (slash)
            filename = slash + 1;

I bet what is happening on drongo is that the compiler has generated a
__FILE__ value that contains backslashes not slashes, and this code
doesn't know how to shorten those.  So maybe instead of lobotomizing
this test, we should fix that.

        slash = strrchr(filename, '/');
+        if (!slash)
+            slash = strrchr(filename, '\\');
        if (slash)
            filename = slash + 1;

            regards, tom lane



Re: libpq debug log

From
Tom Lane
Date:
I wrote:
> I bet what is happening on drongo is that the compiler has generated a
> __FILE__ value that contains backslashes not slashes, and this code
> doesn't know how to shorten those.  So maybe instead of lobotomizing
> this test, we should fix that.

Did that, but we'll have to wait a few hours to see if it makes
drongo happy.

            regards, tom lane



Re: libpq debug log

From
Tom Lane
Date:
I wrote:
>> I bet what is happening on drongo is that the compiler has generated a
>> __FILE__ value that contains backslashes not slashes, and this code
>> doesn't know how to shorten those.  So maybe instead of lobotomizing
>> this test, we should fix that.

> Did that, but we'll have to wait a few hours to see if it makes
> drongo happy.

On third thought, maybe we should push your patch too.  Although I think
53aafdb9f is going to fix the platform-specific aspect of this, we are
still going to risk some implementation dependence of the libpq_pipeline
results:

* Every so often, the number of digits in the reported line numbers
will change (999 -> 1001 or the like), due to changes in unrelated
code.

* Occasionally we refactor things so that the "same" error is reported
from a different file.

It's hard to judge whether that will happen often enough to be an
annoying maintenance problem, but there's definitely a hazard.
Not sure if we should proactively lobotomize the test, or wait to
see if we get annoyed.

In any case I'd like to wait till after drongo's next run, so
we can confirm whether or not the backslashes-in-__FILE__ hypothesis
is correct.  If it is, then 53aafdb9f is a good fix on its own
merits, independently of libpq_pipeline.

            regards, tom lane



Re: libpq debug log

From
Tom Lane
Date:
Alvaro Herrera <alvherre@alvh.no-ip.org> writes:
> On 2021-Apr-02, Tom Lane wrote:
>> +1, but the comment could be more specific.  Maybe like "In regress mode,
>> suppress the length of ErrorResponse and NoticeResponse messages --- the F
>> (file name) field, in particular, can vary in length depending on compiler
>> used."

> With your commit this is no longer true, so how about the attached?

Right.  Your text looks fine.

(I see drongo is now green, so it looks like its compiler does indeed emit
backslash-separated full paths.)

            regards, tom lane



Re: libpq debug log

From
Robert Haas
Date:
On Sat, Jun 5, 2021 at 11:03 AM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
> > This added a PQtraceSetFlags() function.  We have a dozen PQset*() functions,
> > but this and PQresultSetInstanceData() are the only PQSomethingSet()
> > functions.  Is it okay if I rename it to PQsetTraceFlags()?  I think that
> > would be more idiomatic for libpq.
>
> Hmm, there is an obvious parallel between PQtrace() and
> PQtraceSetFlags() which is lost with your proposed rename.  I'm not
> wedded to the name though, so I'm just -0.1 on the rename.  If you feel
> strongly about it, I won't oppose it.

I'm +1 for the rename. I think it's a lot more idiomatic.

-- 
Robert Haas
EDB: http://www.enterprisedb.com



Re: libpq debug log

From
Noah Misch
Date:
On Mon, Jun 07, 2021 at 11:37:58AM -0400, Robert Haas wrote:
> On Sat, Jun 5, 2021 at 11:03 AM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
> > > This added a PQtraceSetFlags() function.  We have a dozen PQset*() functions,
> > > but this and PQresultSetInstanceData() are the only PQSomethingSet()
> > > functions.  Is it okay if I rename it to PQsetTraceFlags()?  I think that
> > > would be more idiomatic for libpq.
> >
> > Hmm, there is an obvious parallel between PQtrace() and
> > PQtraceSetFlags() which is lost with your proposed rename.  I'm not
> > wedded to the name though, so I'm just -0.1 on the rename.  If you feel
> > strongly about it, I won't oppose it.
> 
> I'm +1 for the rename. I think it's a lot more idiomatic.

At +1 vs. -0.1, I would have withdrawn the proposal.  Seeing +2 vs. -0.1, I
have pushed it.