Re: Transaction timeout - Mailing list pgsql-hackers
From | Andrey Borodin |
---|---|
Subject | Re: Transaction timeout |
Date | |
Msg-id | CAAhFRxg9i6DrHT1cALBNztsnJ7OR3cawfCmyFA=J6=oDFX+h2Q@mail.gmail.com Whole thread Raw |
In response to | Re: Transaction timeout (Andrey Borodin <amborodin86@gmail.com>) |
Responses |
Re: Transaction timeout
Re: Transaction timeout |
List | pgsql-hackers |
On Wed, Dec 7, 2022 at 1:30 PM Andrey Borodin <amborodin86@gmail.com> wrote: > I hope to address other feedback on the weekend. Andres, here's my progress on working with your review notes. > > @@ -3277,6 +3282,7 @@ ProcessInterrupts(void) > > */ > > lock_timeout_occurred = get_timeout_indicator(LOCK_TIMEOUT, true); > > stmt_timeout_occurred = get_timeout_indicator(STATEMENT_TIMEOUT, true); > > + tx_timeout_occurred = get_timeout_indicator(TRANSACTION_TIMEOUT, true); > > > > /* > > * If both were set, we want to report whichever timeout completed > > This doesn't update the preceding comment, btw, which now reads oddly: I've rewritten this part to correctly report all timeouts that did happen. However there's now a tricky comma-formatting code which was tested only manually. > > > > @@ -1360,6 +1363,16 @@ IdleInTransactionSessionTimeoutHandler(void) > > > > SetLatch(MyLatch); > > > > } > > > > > > > > +static void > > > > +TransactionTimeoutHandler(void) > > > > +{ > > > > +#ifdef HAVE_SETSID > > > > + /* try to signal whole process group */ > > > > + kill(-MyProcPid, SIGINT); > > > > +#endif > > > > + kill(MyProcPid, SIGINT); > > > > +} > > > > + > > > > > > Why does this use signals instead of just setting the latch like > > > IdleInTransactionSessionTimeoutHandler() etc? > > > > I just copied statement_timeout behaviour. As I understand this > > implementation is prefered if the timeout can catch the backend > > running at full steam. > > Hm. I'm not particularly convinced by that code. Be that as it may, I > don't think it's a good idea to have one more copy of this code. At > least the patch should wrap the signalling code in a helper. Done, now there is a single CancelOnTimeoutHandler() handler. > > > > diff --git a/src/bin/pg_dump/pg_backup_archiver.c b/src/bin/pg_dump/pg_backup_archiver.c > > > > index 0081873a72..5229fe3555 100644 > > > > --- a/src/bin/pg_dump/pg_backup_archiver.c > > > > +++ b/src/bin/pg_dump/pg_backup_archiver.c > > > > @@ -3089,6 +3089,7 @@ _doSetFixedOutputState(ArchiveHandle *AH) > > > > ahprintf(AH, "SET statement_timeout = 0;\n"); > > > > ahprintf(AH, "SET lock_timeout = 0;\n"); > > > > ahprintf(AH, "SET idle_in_transaction_session_timeout = 0;\n"); > > > > + ahprintf(AH, "SET transaction_timeout = 0;\n"); > > > > > > Hm - why is that the right thing to do? > > Because transaction_timeout has effects of statement_timeout. > > I guess it's just following precedent - but it seems a bit presumptuous > to just disable safety settings a DBA might have set up. That makes some > sense for e.g. idle_in_transaction_session_timeout, because I think > e.g. parallel backup can lead to a connection being idle for a bit. I do not know. My reasoning - everywhere we turn off statement_timeout, we should turn off transaction_timeout too. But I have no strong opinion here. I left this code as is in the patch so far. For the same reason I did not change anything in pg_backup_archiver.c. > > Either way we can do batch function enable_timeouts() instead > > enable_timeout_after(). > > > Does anything of it make sense? > > I'm at least as worried about the various calls *after* the execution of > a statement. I think this code is just a one bit check if (get_timeout_active(TRANSACTION_TIMEOUT)) inside of get_timeout_active(). With all 14 timeouts we have, I don't see a good way to optimize stuff so far. > > + if (tx_timeout_occurred) > > + { > > + LockErrorCleanup(); > > + ereport(ERROR, > > + (errcode(ERRCODE_TRANSACTION_TIMEOUT), > > + errmsg("canceling transaction due to transaction timeout"))); > > + } > > The number of calls to LockErrorCleanup() here feels wrong - there's > already 8 calls in ProcessInterrupts(). Besides the code duplication I > also think it's not a sane idea to rely on having LockErrorCleanup() > before all the relevant ereport(ERROR)s. I've refactored that code down to 7 calls of LockErrorCleanup() :) Logic behind various branches is not clear for me, e.g. why we do not LockErrorCleanup() when reading commands from a client? So I did not risk refactoring further. > I think the test should verify > that transaction timeout interacts correctly with statement timeout / > idle in tx timeout. I've added tests that check statement_timeout vs transaction_timeout. However I could not produce stable tests with idle_in_transaction_timeout vs transaction_timeout so far. But I'll look into this more. Actually, stabilizing statement_timeout vs transaction_timeout was tricky on Windows too. I had to remove the second call to pg_sleep(0.0001) because it was triggering 10ьs timeout from time to time. Also, test timeout was increased to 30ms, because unlike others in spec it's not supposed to happen at the very first SQL statement. Thank you! Best regards, Andrey Borodin.
Attachment
pgsql-hackers by date: