Re: libpq should append auth failures, not overwrite - Mailing list pgsql-hackers

From Fabien COELHO
Subject Re: libpq should append auth failures, not overwrite
Date
Msg-id alpine.DEB.2.21.1808120938250.6189@lancre
Whole thread Raw
In response to Re: libpq should append auth failures, not overwrite  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: libpq should append auth failures, not overwrite
List pgsql-hackers
Hello Tom,

> Some points for discussion and review:
>
> 1.  The bulk of patch 0001 is indeed just
> s/printfPQExpBuffer/appendPQExpBuffer/g, though I also made an attempt
> to use appendPQExpBufferStr wherever possible.  There are two categories
> of printfPQExpBuffer calls I didn't change:

> 1a. Calls reporting an out-of-memory situation.  There was already a
> policy in some places that we'd intentionally printf not append such
> messages, and I made that explicit.  The idea is that we might not
> have room to add more text to errorMessage, so resetting the buffer
> provides more certainty of success.  However, this makes for a pretty
> weird inconsistency in the code; there are a lot of functions now in
> which half of the messages are printf'd and half are append'd, so I'm
> afraid that future patches will get it wrong as to which to use.  Also,
> in reality there often *would* be room to append "out of memory" without
> enlarging errorMessage's buffer, so that this could just be pointless
> destruction of useful error history.  I didn't do anything about it here,
> but I'm tempted to replace all the printf's for OOM messages with a
> subroutine that will append if it can do so without enlarging the existing
> buffer, else replace.
>
> 1b. There are a few places where it didn't seem worth changing anything
> because it'd just result in needing to add a resetPQExpBuffer in the same
> function or very nearby.  Mostly in fe-lobj.c.
>
> 2. I had to drop some code (notably in pqPrepareAsyncResult) that
> attempted to force conn->errorMessage to always have the same contents
> as the current PGresult's PQresultErrorMessage; as it stood, server
> errors such as "role does not exist" wiped out preceding contents of
> errorMessage, breaking cases such as the second example above.  This is
> slightly annoying, but in the new dispensation it's possible that
> conn->errorMessage contains a mix of libpq-generated errors and text
> received from the server, so I'm not sure that preserving the old
> equivalence is a useful goal anyway.  We shouldn't cram pre-existing
> errorMessage text back into a server-generated error; that would
> invalidate the server's auxiliary error info such as the SQLSTATE.
> I don't know if it's possible to do better here.  One idea is to tweak
> pqPrepareAsyncResult so that it does the message synchronization only
> when in CONNECTION_OK state, allowing conn->errorMessage to be a
> historical record only for connection processing.  But that seems like
> a hack.
>
> 3. In some places I failed to resist the temptation to copy-edit error
> messages that didn't meet style guidelines, or to add libpq_gettext()
> annotation where it seemed to have been missed.
>
> 4. connectDBComplete() cleared errorMessage after a successful
> completion, but that's the wrong place to do it; we have to do that
> at the success exits from PQconnectPoll, else applications that
> use PQconnectPoll directly will see leftover garbage there.
>
> 5. There were places in PQconnectPoll that temporarily set
> conn->status = CONNECTION_OK to prevent PQsendQueryStart from
> complaining.  A better idea is to leave the status alone and change
> that test in PQsendQueryStart to be status != CONNECTION_BAD.  This
> not only simplifies PQconnectPoll, but allows PQsendQueryStart
> to tell whether we're still in the connection sequence.  I fixed
> it to not clear errorMessage when we are, thus solving the problem
> of PQsendQuery destroying prior connection error history.  That allows
> getting rid of saveErrorMessage/restoreErrorMessage, which were
> (a) an incredibly ugly and inefficient hack, and (b) inadequate,
> since there were other paths through PQconnectPoll that could reach
> PQsendQueryStart but were not protected with calls to those.
>
> 6. In the 0002 patch, I have it printing the annotation only when
> there's more than one connhost[], and only once per connhost not
> once per IP address.  The motivation for doing it like that was to
> not change the behavior for simple cases with only one host name.
> There is certainly room to argue that we should make an annotation
> once per IP address, but that'd change the behavior for pretty common
> cases like "localhost" resolving to both 127.0.0.1 and ::1, so I don't
> know if it'd be a good idea or not.  There's also an interaction to
> consider with my patch at
> https://www.postgresql.org/message-id/4913.1533827102%40sss.pgh.pa.us
> namely that we have to be sure the labeling behaves sanely for
> connhosts that fail to resolve any IP addresses.
>
> I'll put this into the September commitfest.

* About the *first* patch

It applies cleanly and compiles.

Alas, global "make check" is ok although there is no change on regression 
tests, which suggest that the multiple error message reporting is not 
tested anywhere. Should there be at least one test?

Getting rid of somehow ugly "let's save the current message and restore it 
later" logic in places looks like a definite improvement.

There are still 86 instances of "printfPQExpBuffer", which seems like a 
lot. They are mostly OOM messages, but not only. This make checking the 
patch quite cumbersome.

I'd rather there would be only one rule, and clear reset with a comment 
when needded (eg "fe-lobj.c").

I agree with your suggestion of adding a function trying to preserve 
messages on OOM errors if possible. Maybe "appendPQExpBufferOOM"?

It is unclear to me why the "printf" variant is used in 
"PQencryptPasswordConn" and "connectOptions2", it looks that you skipped 
these functions.

In passing, some OOM message are rather terse: "out of memory\n", other 
are more precise "out of memory allocating GSSAPI buffer (%d)\n".

I do not think it is worth creating a l10n exception on a should not 
happen message, eg (there are a few):

  -  /* shouldn't happen */
  -  printfPQExpBuffer(&conn->errorMessage,
  -                    libpq_gettext("invalid SCRAM exchange state\n"));
  +  /* shouldn't happen, so we don't translate the msg */
  +  appendPQExpBufferStr(&conn->errorMessage,
  +                       "invalid SCRAM exchange state\n");

I would have let "libpq_gettext" it as it was.

I'm ok of libpq & server messages are mixed, because I do not see why one 
may take precedent on the others, and when some strange & combined errors, 
every clue is worthy.

I have not seen anything which raises a "risky" alarm about undesired 
consequences, but who knows?

Comments update seem ok. It is possible that some were missed.


* About the second patch:

It applies cleanly on top of previous and compiles. Yet again, global 
"make check" is ok although there were no test changes, which just show 
the abysmal coverage of psql regression tests.

Should there be at least one test?

The feature answers some of my issues with the "libpq should not look up 
all host addresses at once" patch.

* "server \"%s\" port %s:\n"

I do not like the resulting output

server:
problem found
   more details

I'd rather have

server: problem found
   more details

which would require to append a '\n' at the beginning of the line from the 
second attempt.

ISTM that both the hostname and ip should be shown to avoid confusion 
about hosts with multiple ips, esp. as ips are given in any order by the 
dns.

The patch misses the one server with multiple ips case. ISTM that the 
introductory line should be shown in such case as well.

  sh> psql "host=local.coelho.net,localhost port=5434"
  psql: server "local.coelho.net" port 5434:
    ...
  server "localhost" port 5434:
   ...

But:

  sh> psql "host=local2.coelho.net port=5434"
  psql: could not connect to server: Connection refused
     ...
  could not connect to server: Connection refused
     ...

Even if the hint message in this case is explicit enough, ISTM that the 
server line is deserved.

Also for homogeneity, I'd suggest to always add the server line. If the 
server introduction is inserted in all cases, including when only one host 
is used, hints become partially redundant:

   server "local.coelho.net" port 5434:
   could not connect to server: Connection refused
         Is the server running on host "local.coelho.net" (127.0.0.1) and accepting
         TCP/IP connections on port 5434?

This would allow to simplify more hints, which you seem to have done on 
"open read-write session" and "SHOW transaction_read_only" but not others.

-- 
Fabien.


pgsql-hackers by date:

Previous
From: Daniel Gustafsson
Date:
Subject: Re: [HACKERS] Optional message to user when terminating/cancellingbackend
Next
From: Pavel Stehule
Date:
Subject: Re: [HACKERS] Optional message to user when terminating/cancelling backend