pgsql: Fix assorted issues in parallel vacuumdb. - Mailing list pgsql-committers

From Tom Lane
Subject pgsql: Fix assorted issues in parallel vacuumdb.
Date
Msg-id E1f2N7f-0003Lp-T5@gemulon.postgresql.org
Whole thread Raw
List pgsql-committers
Fix assorted issues in parallel vacuumdb.

Avoid storing the result of PQsocket() in a pgsocket variable; it's
declared as int, and the no-socket test is properly written as "x < 0"
not "x == PGINVALID_SOCKET".  This accidentally had no bad effect
because we never got to init_slot() with a bad connection, but it's
still wrong.

Actually, it seems like we should avoid storing the result for a long
period at all.  The function's not so expensive that it's worth avoiding,
and the existing coding technique here would fail if anyone tried to
PQreset the connection during the life of the program.  Hence, just
re-call PQsocket every time we construct a select(2) mask.

Speaking of select(), GetIdleSlot imagined that it could compute the
select mask once and continue to use it over multiple calls to
select_loop(), which is pretty bogus since that would stomp on the
mask on return.  This could only matter if the function's outer loop
iterated more than once, which is unlikely (it'd take some connection
receiving data, but not enough to complete its command).  But if it
did happen, we'd acquire "tunnel vision" and stop watching the other
connections for query termination, with the effect of losing parallelism.

Another way in which GetIdleSlot could lose parallelism is that once
PQisBusy returns false, it would lock in on that connection and do
PQgetResult until that returns NULL; in some cases that could result
in blocking.  (Perhaps this can never happen in vacuumdb due to the
limited set of commands that it can issue, but I'm not quite sure
of that, and even if true today it's not a future-proof assumption.)
Refactor the code to do that properly, so that it risks blocking in
PQgetResult only in cases where we need to wait anyway.

Another loss-of-parallelism problem, which *is* easily demonstrable,
is that any setup queries issued during prepare_vacuum_command() were
always issued on the last-to-be-created connection, whether or not
that was idle.  Long-running operations on that connection thus
prevented issuance of additional operations on the other ones, except
in the limited cases where no preparatory query was needed.  Instead,
wait till we've identified a free connection and use that one.

Also, avoid core dump due to undersized malloc request in the case
that no tables are identified to be vacuumed.

The bogus no-socket test was noted by CharSyam, the other problems
identified in my own code review.  Back-patch to 9.5 where parallel
vacuumdb was introduced.

Discussion: https://postgr.es/m/CAMrLSE6etb33-192DTEUGkV-TsvEcxtBDxGWG1tgNOMnQHwgDA@mail.gmail.com

Branch
------
master

Details
-------
https://git.postgresql.org/pg/commitdiff/94173d3eeb445ae49020c5e2d9e166856d923eaf

Modified Files
--------------
src/bin/scripts/vacuumdb.c | 192 +++++++++++++++++++++++++++------------------
1 file changed, 114 insertions(+), 78 deletions(-)


pgsql-committers by date:

Previous
From: Fabien COELHO
Date:
Subject: Re: pgsql: Set random seed for pgbench.
Next
From: Andrew Dunstan
Date:
Subject: pgsql: Small cleanups in fast default code.