Thread: libpq: Fix lots of discrepancies in PQtrace

libpq: Fix lots of discrepancies in PQtrace

From
Jelte Fennema-Nio
Date:
After initially trying to add trace support for
StartupMessage/SSLRequest/GSSENCRequest[1] I realized there were many
more cases where PQtrace was not correctly tracing messages, or not
even tracing them at all. These patches fix all of the issues that I
was able to find.

0001 is some cleanup after f4b54e1ed9
0002 does some preparatory changes for 0004 & 0007

All the others improve the tracing, and apart from 0004 and 0007
depending on 0002, none of these patches depend on each other.
Although you could argue that 0007 and 0008 depend on 0006, because
without 0006 the code added by 0007 and 0008 won't ever really be
executed.

To test you can add a PQreset(conn) call to the start of the
test_cancel function in:
src/test/modules/libpq_pipeline/libpq_pipeline.c.

And then run:
ninja -C build all install-quiet &&
build/src/test/modules/libpq_pipeline/libpq_pipeline cancel
'port=5432' -t test.trace

And then look at the top of test.trace

[1]: https://www.postgresql.org/message-id/CAGECzQTTN5aGqtDaRifJXPyd_O5qHWQcOxsHJsDSVNqMugGNEA%40mail.gmail.com

Attachment

Re: libpq: Fix lots of discrepancies in PQtrace

From
Nathan Bossart
Date:
On Fri, Jun 21, 2024 at 11:22:05AM +0200, Jelte Fennema-Nio wrote:
> 0001 is some cleanup after f4b54e1ed9

Oops.  I'll plan on committing this after the 17beta2 release freeze is
lifted.

-- 
nathan



Re: libpq: Fix lots of discrepancies in PQtrace

From
Nathan Bossart
Date:
On Fri, Jun 21, 2024 at 04:01:55PM -0500, Nathan Bossart wrote:
> On Fri, Jun 21, 2024 at 11:22:05AM +0200, Jelte Fennema-Nio wrote:
>> 0001 is some cleanup after f4b54e1ed9
> 
> Oops.  I'll plan on committing this after the 17beta2 release freeze is
> lifted.

Committed 0001.

-- 
nathan



Re: libpq: Fix lots of discrepancies in PQtrace

From
Alvaro Herrera
Date:
On 2024-Jun-26, Nathan Bossart wrote:

> On Fri, Jun 21, 2024 at 04:01:55PM -0500, Nathan Bossart wrote:
> > On Fri, Jun 21, 2024 at 11:22:05AM +0200, Jelte Fennema-Nio wrote:
> >> 0001 is some cleanup after f4b54e1ed9
> > 
> > Oops.  I'll plan on committing this after the 17beta2 release freeze is
> > lifted.
> 
> Committed 0001.

Thanks, Nathan.  I'm holding myself responsible for the rest ... will
handle soon after the branch.

-- 
Álvaro Herrera        Breisgau, Deutschland  —  https://www.EnterpriseDB.com/
"The problem with the future is that it keeps turning into the present"
(Hobbes)



Re: libpq: Fix lots of discrepancies in PQtrace

From
Jelte Fennema-Nio
Date:
On Wed, 26 Jun 2024 at 18:36, Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
> Thanks, Nathan.  I'm holding myself responsible for the rest ... will
> handle soon after the branch.

Sounds great. Out of curiosity, what is the backpatching policy for
something like this? Honestly most of these patches could be
considered bugfixes in PQtrace, so backpatching might make sense. OTOH
I don't think PQtrace is used very much in practice, so backpatching
might carry more risk than it's worth.



Re: libpq: Fix lots of discrepancies in PQtrace

From
Michael Paquier
Date:
On Wed, Jun 26, 2024 at 10:02:08PM +0200, Jelte Fennema-Nio wrote:
> On Wed, 26 Jun 2024 at 18:36, Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
> > Thanks, Nathan.  I'm holding myself responsible for the rest ... will
> > handle soon after the branch.
>
> Sounds great. Out of curiosity, what is the backpatching policy for
> something like this? Honestly most of these patches could be
> considered bugfixes in PQtrace, so backpatching might make sense. OTOH
> I don't think PQtrace is used very much in practice, so backpatching
> might carry more risk than it's worth.

0001 getting on HEAD after the feature freeze as a cleanup piece
cleanup is no big deal.  That's cosmetic, still OK.

Looking at the whole, the rest of the patch set qualifies as a new
feature, even if they're aimed at closing existing gaps.
Particularly, you have bits of new infrastructure introduced in libpq
like the current_auth_response business in 0004, making it a new
feature by structure.

+    conn->current_auth_response = AUTH_RESP_PASSWORD;
     ret = pqPacketSend(conn, PqMsg_PasswordMessage, pwd_to_send, strlen(pwd_to_send) + 1);
+    conn->current_auth_response = AUTH_RESP_NONE;

It's a surprising approach.  Future callers of pqPacketSend() and
pqPutMsgEnd() would easily miss that this flag should be set, as much
as reset.  Isn't that something that should be added in input of these
functions?

+    AuthResponseType current_auth_response;
I'd recommend to document what this flag is here for, with a comment.
--
Michael

Attachment

Re: libpq: Fix lots of discrepancies in PQtrace

From
Jelte Fennema-Nio
Date:
On Thu, 27 Jun 2024 at 07:39, Michael Paquier <michael@paquier.xyz> wrote:
> Looking at the whole, the rest of the patch set qualifies as a new
> feature, even if they're aimed at closing existing gaps.

Alright, seems reasonable. To be clear, I don't care at all about this
being backported personally.

> Particularly, you have bits of new infrastructure introduced in libpq
> like the current_auth_response business in 0004, making it a new
> feature by structure.
>
> +       conn->current_auth_response = AUTH_RESP_PASSWORD;
>         ret = pqPacketSend(conn, PqMsg_PasswordMessage, pwd_to_send, strlen(pwd_to_send) + 1);
> +       conn->current_auth_response = AUTH_RESP_NONE;
>
> It's a surprising approach.  Future callers of pqPacketSend() and
> pqPutMsgEnd() would easily miss that this flag should be set, as much
> as reset.  Isn't that something that should be added in input of these
> functions?

Yeah, I'm not entirely happy about it either. But adding an argument
to pqPutMsgEnd and pqPutPacketSend would mean all the existing call
sites would need to change, even though only 4 of them would care
about the new argument. You could argue that it's the better solution,
but it would at least greatly increase the size of the diff. Of course
to reduce the diff size you could make the old functions a wrapper
around a new one with the extra argument, but I couldn't think of a
good name for those functions. Overall I went for the chosen approach
here, because it only impacted code at the call sites for these auth
packets (which are the only v3 packets in the protocol that you cannot
interpret based on their contents alone).

I think your worry about easily missing to set/clear the flag is not a
huge problem in practice. We almost never add new authentication
messages and it's only needed there. Also the clearing is not even
strictly necessary for the tracing to behave correctly, but it seemed
like the right thing to do.

> +       AuthResponseType current_auth_response;
> I'd recommend to document what this flag is here for, with a comment.

Oops, yeah I forgot about that. Done now.

Attachment

Re: libpq: Fix lots of discrepancies in PQtrace

From
Alvaro Herrera
Date:
Pushed 0002 and 0003.  On the latter: I decided against using int32 to
print the request identifiers; by splitting into two int16's, we see
that the numbers match the values in the PG_PROTOCOL() declarations:

2024-08-09 17:37:38.364622    F    8    SSLRequest     1234 5679
and
2024-08-09 17:37:38.422109    F    16    CancelRequest     1234 5678 NNNN NNNN

(I didn't verify GSSEncRequest directly.)

I also verified that in non-regress mode, the values printed by
CancelRequest match those in the BackendKeyData message,
2024-08-09 17:34:27.544686    B    12    BackendKeyData     NNNN NNNN

I also added suppression in regress mode for the backend key in the
CancelRequest message, since they would be different each time.

There are no tests for this code.  We could add a trace file for the
connection packet in libpq_pipeline by changing PQconnectdb() to
PQconnectStart() and then do PQtrace before polling until the connection
is ready; we would have to have it match for the TAP test.  Not sure
this is worth the effort.  But doing this in a very crude way allowed me
to verify that, at least on my machine, this code is doing what's
expected.

Thank you,

-- 
Álvaro Herrera         PostgreSQL Developer  —  https://www.EnterpriseDB.com/



Re: libpq: Fix lots of discrepancies in PQtrace

From
Alvaro Herrera
Date:
Regarding 0004:

I don't want to add 4 bytes to struct pg_conn for tracing support.  I'm
tempted to make the new struct member a plain 'char' to reduce overhead
for a feature that almost nobody is going to use.  According to pahole
we have a 3 bytes hole in that position of the struct, so if we make it
a 1- or 2-byte member, there's no storage overhead whatsoever.

Also, why not have pqTraceOutputMessage() responsible for resetting the
byte after printing the message?  It seems to cause less undesirable
detritus.

I propose something like the attached, but it's as yet untested.  What
do you think?

-- 
Álvaro Herrera               48°01'N 7°57'E  —  https://www.EnterpriseDB.com/
"El sentido de las cosas no viene de las cosas, sino de
las inteligencias que las aplican a sus problemas diarios
en busca del progreso." (Ernesto Hernández-Novich)

Attachment

Re: libpq: Fix lots of discrepancies in PQtrace

From
Jelte Fennema-Nio
Date:
On Sat, 10 Aug 2024 at 01:08, Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
> I don't want to add 4 bytes to struct pg_conn for tracing support.  I'm
> tempted to make the new struct member a plain 'char' to reduce overhead
> for a feature that almost nobody is going to use.  According to pahole
> we have a 3 bytes hole in that position of the struct, so if we make it
> a 1- or 2-byte member, there's no storage overhead whatsoever.

Sounds fine to me.

> Also, why not have pqTraceOutputMessage() responsible for resetting the
> byte after printing the message?  It seems to cause less undesirable
> detritus.

Yeah, that's indeed much nicer.

> I propose something like the attached, but it's as yet untested.  What
> do you think?

Looks good, but I haven't tested it yet either.



Re: libpq: Fix lots of discrepancies in PQtrace

From
Alvaro Herrera
Date:
On 2024-Aug-10, Jelte Fennema-Nio wrote:

> On Sat, 10 Aug 2024 at 01:08, Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:

> > I propose something like the attached, but it's as yet untested.  What
> > do you think?
> 
> Looks good, but I haven't tested it yet either.

I tested the SASL exchange and it looks OK.  Didn't test the other ones.

Thanks!

-- 
Álvaro Herrera        Breisgau, Deutschland  —  https://www.EnterpriseDB.com/



Re: libpq: Fix lots of discrepancies in PQtrace

From
Jelte Fennema-Nio
Date:
On Wed, 14 Aug 2024 at 19:37, Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
> - to 0005 I change your pqTraceOutputEncryptionRequestResponse()
>   function name to pqTraceOutputCharResponse and instead of attaching
>   the "Response" literal in the outpuer to the name given in the
>   function call, just pass the whole string as argument to the function.

Fine by me

> - to 0006 I change function name pqFinishParsingMessage() to
>   pqParseDone() and reworded the commentary; also moved it to fe-misc.c.
>   Looks good otherwise.

The following removed comments seems useful to keep (I realize I
already removed them in a previous version of the patch, but I don't
think I did that on purpose)

-       /* Drop the processed message and loop around for another */

-               /* consume the message and exit */


-       /* Completed this message, keep going */
-       /* trust the specified message length as what to skip */


> - 0008 to fix NegotiateProtocolVersion looks correct per [1], but I
>   don't know how to test it.  Suggestions?

Two options:
1. Manually change code to make sure SendNegotiateProtocolVersion is
called in src/backend/tcop/backend_startup.c
2. Apply my patches from this thread[2] and use
max_protocol_version=latest in the connection string while connecting
to an older postgres server.

[2]:
https://www.postgresql.org/message-id/flat/CAGECzQTyXDNtMXdq2L-Wp%3DOvOCPa07r6%2BU_MGb%3D%3Dh90MrfT%2BfQ%40mail.gmail.com#1b8cda3523555aafae89cc04293b8613



Re: libpq: Fix lots of discrepancies in PQtrace

From
Alvaro Herrera
Date:
Hello,

On 2024-Aug-14, Jelte Fennema-Nio wrote:

> The following removed comments seems useful to keep (I realize I
> already removed them in a previous version of the patch, but I don't
> think I did that on purpose)
> [...]

Ah, yeah, I agree.  I put them back, and pushed 0005, 6 and 7 as a
single commit.  It didn't seem worth pushing each separately, really.  I
added two lines for the CopyData message as well, since otherwise the
output shows the "mismatched length" error when getting COPY data.

I'm leaving 0008 to whoever is doing the NegotiateProtocolVersion stuff;
maybe post that one in that thread you mentioned.  I'll mark this CF
entry committed.

Many thanks!

-- 
Álvaro Herrera        Breisgau, Deutschland  —  https://www.EnterpriseDB.com/