Thread: [PATCH] lock_timeout and common SIGALRM framework
Hi, attached is a patch to implement a framework to simplify and correctly nest multiplexing more than two timeout sources into the same SIGALRM signal handler. The framework is using a new internal API for timeouts: bool enable_timeout(TimeoutName tn, int delayms); bool disable_timeout(TimeoutName tn, bool keep_indicator); bool disable_all_timeouts(bool keep_indicators); A timeout source has these features to allow different initialization, cleanup and check functions and rescheduling: typedef void (*timeout_init)(TimestampTz, TimestampTz); typedef void (*timeout_destroy)(bool); typedef bool (*timeout_check)(void); typedef TimestampTz (*timeout_start)(void); typedef struct { TimeoutName index; bool resched_next; timeout_init timeout_init; timeout_destroy timeout_destroy; timeout_check timeout_check; timeout_start timeout_start; TimestampTz fin_time; } timeout_params; This makes it possible to differentiate between the standby and statement timeouts, regular deadlock and standby deadlock using the same signal handler function. And finally, this makes it possible to implement the lock_timeout feature that we at Cybertec implemented more than 2 years ago. The patch also adds two new tests into prepared_xacts.sql to trigger the lock_timeout instead of statement_timeout. Documentation and extensive comments are included. Best regards, Zoltán Böszörményi -- ---------------------------------- Zoltán Böszörményi Cybertec Schönig& Schönig GmbH Gröhrmühlgasse 26 A-2700 Wiener Neustadt, Austria Web: http://www.postgresql-support.de http://www.postgresql.at/
Attachment
Hi, 2012-04-04 12:30 keltezéssel, Boszormenyi Zoltan írta: > Hi, > > attached is a patch to implement a framework to simplify and > correctly nest multiplexing more than two timeout sources > into the same SIGALRM signal handler. > > The framework is using a new internal API for timeouts: > > bool enable_timeout(TimeoutName tn, int delayms); > bool disable_timeout(TimeoutName tn, bool keep_indicator); > bool disable_all_timeouts(bool keep_indicators); > > A timeout source has these features to allow different initialization, > cleanup and check functions and rescheduling: > > typedef void (*timeout_init)(TimestampTz, TimestampTz); > typedef void (*timeout_destroy)(bool); > typedef bool (*timeout_check)(void); > typedef TimestampTz (*timeout_start)(void); > > typedef struct { > TimeoutName index; > bool resched_next; > timeout_init timeout_init; > timeout_destroy timeout_destroy; > timeout_check timeout_check; > timeout_start timeout_start; > TimestampTz fin_time; > } timeout_params; > > This makes it possible to differentiate between the standby and > statement timeouts, regular deadlock and standby deadlock using > the same signal handler function. > > And finally, this makes it possible to implement the lock_timeout > feature that we at Cybertec implemented more than 2 years ago. > > The patch also adds two new tests into prepared_xacts.sql to trigger > the lock_timeout instead of statement_timeout. > > Documentation and extensive comments are included. Second version. Every timeout-related functions are now in a separate source file, src/backend/storage/timeout.c with accessor functions. There are no related global variables anymore, only the GUCs. > > Best regards, > Zoltán Böszörményi -- ---------------------------------- Zoltán Böszörményi Cybertec Schönig& Schönig GmbH Gröhrmühlgasse 26 A-2700 Wiener Neustadt, Austria Web: http://www.postgresql-support.de http://www.postgresql.at/
Attachment
2012-04-04 15:17 keltezéssel, Boszormenyi Zoltan írta:
3rd and (for now) final version. Tidied comments, the time checks in
Check*() functions and function order in timeout.c.
Hi,
2012-04-04 12:30 keltezéssel, Boszormenyi Zoltan írta:Hi,
attached is a patch to implement a framework to simplify and
correctly nest multiplexing more than two timeout sources
into the same SIGALRM signal handler.
The framework is using a new internal API for timeouts:
bool enable_timeout(TimeoutName tn, int delayms);
bool disable_timeout(TimeoutName tn, bool keep_indicator);
bool disable_all_timeouts(bool keep_indicators);
A timeout source has these features to allow different initialization,
cleanup and check functions and rescheduling:
typedef void (*timeout_init)(TimestampTz, TimestampTz);
typedef void (*timeout_destroy)(bool);
typedef bool (*timeout_check)(void);
typedef TimestampTz (*timeout_start)(void);
typedef struct {
TimeoutName index;
bool resched_next;
timeout_init timeout_init;
timeout_destroy timeout_destroy;
timeout_check timeout_check;
timeout_start timeout_start;
TimestampTz fin_time;
} timeout_params;
This makes it possible to differentiate between the standby and
statement timeouts, regular deadlock and standby deadlock using
the same signal handler function.
And finally, this makes it possible to implement the lock_timeout
feature that we at Cybertec implemented more than 2 years ago.
The patch also adds two new tests into prepared_xacts.sql to trigger
the lock_timeout instead of statement_timeout.
Documentation and extensive comments are included.
Second version. Every timeout-related functions are now in a separate
source file, src/backend/storage/timeout.c with accessor functions.
There are no related global variables anymore, only the GUCs.
3rd and (for now) final version. Tidied comments, the time checks in
Check*() functions and function order in timeout.c.
Best regards,
Zoltán Böszörményi
-- ---------------------------------- Zoltán Böszörményi Cybertec Schönig & Schönig GmbH Gröhrmühlgasse 26 A-2700 Wiener Neustadt, Austria Web: http://www.postgresql-support.de http://www.postgresql.at/
Attachment
2012-04-04 16:22 keltezéssel, Boszormenyi Zoltan írta:
I lied. This is the final one. I fixed a typo in the documentation
and replaced timeout_start_time (previously static to proc.c)
with get_timeout_start(DEADLOCK_TIMEOUT). This one should
have happened in the second version.
2012-04-04 15:17 keltezéssel, Boszormenyi Zoltan írta:Hi,
2012-04-04 12:30 keltezéssel, Boszormenyi Zoltan írta:Hi,
attached is a patch to implement a framework to simplify and
correctly nest multiplexing more than two timeout sources
into the same SIGALRM signal handler.
The framework is using a new internal API for timeouts:
bool enable_timeout(TimeoutName tn, int delayms);
bool disable_timeout(TimeoutName tn, bool keep_indicator);
bool disable_all_timeouts(bool keep_indicators);
A timeout source has these features to allow different initialization,
cleanup and check functions and rescheduling:
typedef void (*timeout_init)(TimestampTz, TimestampTz);
typedef void (*timeout_destroy)(bool);
typedef bool (*timeout_check)(void);
typedef TimestampTz (*timeout_start)(void);
typedef struct {
TimeoutName index;
bool resched_next;
timeout_init timeout_init;
timeout_destroy timeout_destroy;
timeout_check timeout_check;
timeout_start timeout_start;
TimestampTz fin_time;
} timeout_params;
This makes it possible to differentiate between the standby and
statement timeouts, regular deadlock and standby deadlock using
the same signal handler function.
And finally, this makes it possible to implement the lock_timeout
feature that we at Cybertec implemented more than 2 years ago.
The patch also adds two new tests into prepared_xacts.sql to trigger
the lock_timeout instead of statement_timeout.
Documentation and extensive comments are included.
Second version. Every timeout-related functions are now in a separate
source file, src/backend/storage/timeout.c with accessor functions.
There are no related global variables anymore, only the GUCs.
3rd and (for now) final version.
I lied. This is the final one. I fixed a typo in the documentation
and replaced timeout_start_time (previously static to proc.c)
with get_timeout_start(DEADLOCK_TIMEOUT). This one should
have happened in the second version.
Tidied comments, the time checks in
Check*() functions and function order in timeout.c.
Best regards,
Zoltán Böszörményi-- ---------------------------------- Zoltán Böszörményi Cybertec Schönig & Schönig GmbH Gröhrmühlgasse 26 A-2700 Wiener Neustadt, Austria Web: http://www.postgresql-support.de http://www.postgresql.at/
-- ---------------------------------- Zoltán Böszörményi Cybertec Schönig & Schönig GmbH Gröhrmühlgasse 26 A-2700 Wiener Neustadt, Austria Web: http://www.postgresql-support.de http://www.postgresql.at/
Attachment
2012-04-04 17:12 keltezéssel, Boszormenyi Zoltan írta: <blockquote cite="mid:4F7C64F2.5070905@cybertec.at" type="cite">2012-04-04 16:22 keltezéssel, Boszormenyi Zoltan írta: <blockquote cite="mid:4F7C5925.107@cybertec.at" type="cite">2012-04-04 15:17 keltezéssel, Boszormenyi Zoltan írta: <blockquote cite="mid:4F7C49DA.7080008@cybertec.at" type="cite">Hi,<br /><br /> 2012-04-04 12:30 keltezéssel, Boszormenyi Zoltan írta: <br /><blockquote type="cite">Hi, <br/><br /> attached is a patch to implement a framework to simplify and <br /> correctly nest multiplexing more than twotimeout sources <br /> into the same SIGALRM signal handler. <br /><br /> The framework is using a new internal API fortimeouts: <br /><br /> bool enable_timeout(TimeoutName tn, int delayms); <br /> bool disable_timeout(TimeoutName tn, boolkeep_indicator); <br /> bool disable_all_timeouts(bool keep_indicators); <br /><br /> A timeout source has these featuresto allow different initialization, <br /> cleanup and check functions and rescheduling: <br /><br /> typedef void(*timeout_init)(TimestampTz, TimestampTz); <br /> typedef void (*timeout_destroy)(bool); <br /> typedef bool (*timeout_check)(void);<br /> typedef TimestampTz (*timeout_start)(void); <br /><br /> typedef struct { <br /> TimeoutName index; <br /> bool resched_next; <br /> timeout_init timeout_init; <br /> timeout_destroy timeout_destroy; <br /> timeout_check timeout_check; <br /> timeout_start timeout_start;<br /> TimestampTz fin_time; <br /> } timeout_params; <br /><br /> This makes it possible to differentiatebetween the standby and <br /> statement timeouts, regular deadlock and standby deadlock using <br /> the samesignal handler function. <br /><br /> And finally, this makes it possible to implement the lock_timeout <br /> featurethat we at Cybertec implemented more than 2 years ago. <br /><br /> The patch also adds two new tests into prepared_xacts.sqlto trigger <br /> the lock_timeout instead of statement_timeout. <br /><br /> Documentation and extensivecomments are included. <br /></blockquote><br /> Second version. Every timeout-related functions are now in a separate<br /> source file, src/backend/storage/timeout.c with accessor functions. <br /> There are no related global variablesanymore, only the GUCs. <br /></blockquote><br /> 3rd and (for now) final version.</blockquote><br /> I lied. Thisis the final one. I fixed a typo in the documentation<br /> and replaced timeout_start_time (previously static to proc.c)<br/> with get_timeout_start(DEADLOCK_TIMEOUT). This one should<br /> have happened in the second version.<br /><br/><blockquote cite="mid:4F7C5925.107@cybertec.at" type="cite"> Tidied comments, the time checks in<br /> Check*() functionsand function order in timeout.c.<br /><br /><blockquote cite="mid:4F7C49DA.7080008@cybertec.at" type="cite"><br/><blockquote type="cite"><br /> Best regards, <br /> Zoltán Böszörményi <br /></blockquote></blockquote></blockquote></blockquote><br/> One comment for testers: all the timeout GUC values are givenin<br /> milliseconds, the kernel interface (setitimer) and TimestampTz uses<br /> microseconds.<br /><br /> The transactionthat locks is inherently a read/write one and by the time<br /> the code reaches ProcSleep(), at least a few tensof microseconds has<br /> already passed since the start of the statement even on 3GHz+ CPUs.<br /><br /> Not to mentionthat computers nowadays have high precision timers<br /> and OSs running on them utilitize those. So, the time returnedby<br /> GetCurrentStatementStartTimestamp() will certainly be different from<br /> GetCurrentTimestamp(). This meansthat the timeouts' fin_time will also<br /> be different.<br /><br /> Long story short, using the same value for statement_timeoutand<br /> lock_timeout (or deadlock_timeout for that matter) means that<br /> statement_timeout will triggerfirst. The story is different only on<br /> a combination of a fast CPU and an OS with greater-then-millisecond<br/> timer resolution.<br /><br /> Best regards,<br /> Zoltán Böszörményi<br /><br /><pre class="moz-signature"cols="90">-- ---------------------------------- Zoltán Böszörményi Cybertec Schönig & Schönig GmbH Gröhrmühlgasse 26 A-2700 Wiener Neustadt, Austria Web: <a class="moz-txt-link-freetext" href="http://www.postgresql-support.de">http://www.postgresql-support.de</a> <aclass="moz-txt-link-freetext" href="http://www.postgresql.at/">http://www.postgresql.at/</a> </pre>
I think this patch is doing two things: first touching infrastructure stuff and then adding lock_timeout on top of that. Would it work to split the patch in two pieces? -- Álvaro Herrera <alvherre@commandprompt.com> The PostgreSQL Company - Command Prompt, Inc. PostgreSQL Replication, Consulting, Custom Development, 24x7 support
2012-04-04 21:30 keltezéssel, Alvaro Herrera írta: > I think this patch is doing two things: first touching infrastructure > stuff and then adding lock_timeout on top of that. Would it work to > split the patch in two pieces? > Sure. Attached is the split version. Best regards, Zoltán Böszörményi -- ---------------------------------- Zoltán Böszörményi Cybertec Schönig& Schönig GmbH Gröhrmühlgasse 26 A-2700 Wiener Neustadt, Austria Web: http://www.postgresql-support.de http://www.postgresql.at/
Attachment
On 05/04/12 08:02, Boszormenyi Zoltan wrote: > 2012-04-04 21:30 keltezéssel, Alvaro Herrera írta: >> I think this patch is doing two things: first touching infrastructure >> stuff and then adding lock_timeout on top of that. Would it work to >> split the patch in two pieces? >> > > Sure. Attached is the split version. > > Best regards, > Zoltán Böszörményi > Hi, I've started looking at and testing both patches. Technically speaking, I think the source looks much better than the first version of lock timeout, and may help adding other timeouts in the future. I haven't tested it in depth though, because I encountered the following problem: While testing the patch, I found a way to crash PG. But what's weird is that it crashes also with an unpatched git version. Here is the way to reproduce it (I have done it with a pgbench schema): - Set a small statement_timeout (just to save time during the tests) Session1: =#BEGIN; =#lock TABLE pgbench_accounts ; Session 2: =#BEGIN; =#lock TABLE pgbench_accounts ; ERROR: canceling statement due to statement timeout =# lock TABLE pgbench_accounts ; I'm using \set ON_ERROR_ROLLBACK INTERACTIVE by the way. It can also be done with a rollback to savepoint of course. Session 2 crashes with this : TRAP : FailedAssertion(« !(locallock->holdsStrongLockCount == 0) », fichier : « lock.c », ligne : 749). It can also be done without a statement_timeout, and a control-C on the second lock table. I didn't touch anything but this. It occurs everytime, when asserts are activated. I tried it on 9.1.3, and I couldn't make it crash with the same sequence of events. So maybe it's something introduced since ? Or is the assert still valid ? Cheers
2012-04-06 14:47 keltezéssel, Cousin Marc írta: > On 05/04/12 08:02, Boszormenyi Zoltan wrote: >> 2012-04-04 21:30 keltezéssel, Alvaro Herrera írta: >>> I think this patch is doing two things: first touching infrastructure >>> stuff and then adding lock_timeout on top of that. Would it work to >>> split the patch in two pieces? >>> >> Sure. Attached is the split version. >> >> Best regards, >> Zoltán Böszörményi >> > Hi, > > I've started looking at and testing both patches. > > Technically speaking, I think the source looks much better than the > first version of lock timeout, and may help adding other timeouts in the > future. Thanks. > I haven't tested it in depth though, because I encountered the > following problem: > > While testing the patch, I found a way to crash PG. But what's weird is > that it crashes also with an unpatched git version. > > Here is the way to reproduce it (I have done it with a pgbench schema): > > - Set a small statement_timeout (just to save time during the tests) > > Session1: > =#BEGIN; > =#lock TABLE pgbench_accounts ; > > Session 2: > =#BEGIN; > =#lock TABLE pgbench_accounts ; > ERROR: canceling statement due to statement timeout > =# lock TABLE pgbench_accounts ; > > I'm using \set ON_ERROR_ROLLBACK INTERACTIVE by the way. It can also be > done with a rollback to savepoint of course. > > Session 2 crashes with this : TRAP : FailedAssertion(« > !(locallock->holdsStrongLockCount == 0) », fichier : « lock.c », ligne : > 749). > > It can also be done without a statement_timeout, and a control-C on the > second lock table. Indeed, the unpatched GIT version crashes if you enter =#lock TABLE pgbench_accounts ; the second time in session 2 after the first one failed. Also, manually spelling it out: Session 1: $ psql psql (9.2devel) Type "help" for help. zozo=# begin; BEGIN zozo=# lock table pgbench_accounts; LOCK TABLE zozo=# Session 2: zozo=# begin; BEGIN zozo=# savepoint a; SAVEPOINT zozo=# lock table pgbench_accounts; ERROR: canceling statement due to statement timeout zozo=# rollback to a; ROLLBACK zozo=# savepoint b; SAVEPOINT zozo=# lock table pgbench_accounts; The connection to the server was lost. Attempting reset: Failed. !> Server log after the second lock table: TRAP: FailedAssertion("!(locallock->holdsStrongLockCount == 0)", File: "lock.c", Line: 749) LOG: server process (PID 12978) was terminated by signal 6: Aborted Best regards, Zoltán Böszörményi > > I didn't touch anything but this. It occurs everytime, when asserts are > activated. > > I tried it on 9.1.3, and I couldn't make it crash with the same sequence > of events. So maybe it's something introduced since ? Or is the assert > still valid ? > > Cheers > -- ---------------------------------- Zoltán Böszörményi Cybertec Schönig& Schönig GmbH Gröhrmühlgasse 26 A-2700 Wiener Neustadt, Austria Web: http://www.postgresql-support.de http://www.postgresql.at/
2012-04-08 11:24 keltezéssel, Boszormenyi Zoltan írta: > 2012-04-06 14:47 keltezéssel, Cousin Marc írta: >> On 05/04/12 08:02, Boszormenyi Zoltan wrote: >>> 2012-04-04 21:30 keltezéssel, Alvaro Herrera írta: >>>> I think this patch is doing two things: first touching infrastructure >>>> stuff and then adding lock_timeout on top of that. Would it work to >>>> split the patch in two pieces? >>>> >>> Sure. Attached is the split version. >>> >>> Best regards, >>> Zoltán Böszörményi >>> >> Hi, >> >> I've started looking at and testing both patches. >> >> Technically speaking, I think the source looks much better than the >> first version of lock timeout, and may help adding other timeouts in the >> future. > > Thanks. > >> I haven't tested it in depth though, because I encountered the >> following problem: >> >> While testing the patch, I found a way to crash PG. But what's weird is >> that it crashes also with an unpatched git version. >> >> Here is the way to reproduce it (I have done it with a pgbench schema): >> >> - Set a small statement_timeout (just to save time during the tests) >> >> Session1: >> =#BEGIN; >> =#lock TABLE pgbench_accounts ; >> >> Session 2: >> =#BEGIN; >> =#lock TABLE pgbench_accounts ; >> ERROR: canceling statement due to statement timeout >> =# lock TABLE pgbench_accounts ; >> >> I'm using \set ON_ERROR_ROLLBACK INTERACTIVE by the way. It can also be >> done with a rollback to savepoint of course. >> >> Session 2 crashes with this : TRAP : FailedAssertion(« >> !(locallock->holdsStrongLockCount == 0) », fichier : « lock.c », ligne : >> 749). >> >> It can also be done without a statement_timeout, and a control-C on the >> second lock table. > > Indeed, the unpatched GIT version crashes if you enter > =#lock TABLE pgbench_accounts ; > the second time in session 2 after the first one failed. Also, > manually spelling it out: > > Session 1: > > $ psql > psql (9.2devel) > Type "help" for help. > > zozo=# begin; > BEGIN > zozo=# lock table pgbench_accounts; > LOCK TABLE > zozo=# > > Session 2: > > zozo=# begin; > BEGIN > zozo=# savepoint a; > SAVEPOINT > zozo=# lock table pgbench_accounts; > ERROR: canceling statement due to statement timeout > zozo=# rollback to a; > ROLLBACK > zozo=# savepoint b; > SAVEPOINT > zozo=# lock table pgbench_accounts; > The connection to the server was lost. Attempting reset: Failed. > !> > > Server log after the second lock table: > > TRAP: FailedAssertion("!(locallock->holdsStrongLockCount == 0)", File: "lock.c", Line: 749) > LOG: server process (PID 12978) was terminated by signal 6: Aborted > > Best regards, > Zoltán Böszörményi Robert, the Assert triggering with the above procedure is in your "fast path" locking code with current GIT. Best regards, Zoltán Böszörményi > >> >> I didn't touch anything but this. It occurs everytime, when asserts are >> activated. >> >> I tried it on 9.1.3, and I couldn't make it crash with the same sequence >> of events. So maybe it's something introduced since ? Or is the assert >> still valid ? >> >> Cheers >> > > -- ---------------------------------- Zoltán Böszörményi Cybertec Schönig& Schönig GmbH Gröhrmühlgasse 26 A-2700 Wiener Neustadt, Austria Web: http://www.postgresql-support.de http://www.postgresql.at/
2012-04-06 14:47 keltezéssel, Cousin Marc írta: > On 05/04/12 08:02, Boszormenyi Zoltan wrote: >> 2012-04-04 21:30 keltezéssel, Alvaro Herrera írta: >>> I think this patch is doing two things: first touching infrastructure >>> stuff and then adding lock_timeout on top of that. Would it work to >>> split the patch in two pieces? >>> >> Sure. Attached is the split version. >> >> Best regards, >> Zoltán Böszörményi >> > Hi, > > I've started looking at and testing both patches. > > Technically speaking, I think the source looks much better than the > first version of lock timeout, and may help adding other timeouts in the > future. I haven't tested it in depth though, because I encountered the > following problem: > > While testing the patch, I found a way to crash PG. But what's weird is > that it crashes also with an unpatched git version. > > Here is the way to reproduce it (I have done it with a pgbench schema): > > - Set a small statement_timeout (just to save time during the tests) > > Session1: > =#BEGIN; > =#lock TABLE pgbench_accounts ; > > Session 2: > =#BEGIN; > =#lock TABLE pgbench_accounts ; > ERROR: canceling statement due to statement timeout > =# lock TABLE pgbench_accounts ; > > I'm using \set ON_ERROR_ROLLBACK INTERACTIVE by the way. It can also be > done with a rollback to savepoint of course. > > Session 2 crashes with this : TRAP : FailedAssertion(« > !(locallock->holdsStrongLockCount == 0) », fichier : « lock.c », ligne : > 749). > > It can also be done without a statement_timeout, and a control-C on the > second lock table. > > I didn't touch anything but this. It occurs everytime, when asserts are > activated. > > I tried it on 9.1.3, and I couldn't make it crash with the same sequence > of events. So maybe it's something introduced since ? Or is the assert > still valid ? > > Cheers > Attached are the new patches. I rebased them to current GIT and they are expected to be applied after Robert Haas' patch in the "bug in fast-path locking" thread. Now it survives the above scenario. Best regards, Zoltán Böszörményi -- ---------------------------------- Zoltán Böszörményi Cybertec Schönig& Schönig GmbH Gröhrmühlgasse 26 A-2700 Wiener Neustadt, Austria Web: http://www.postgresql-support.de http://www.postgresql.at/
Attachment
2012-04-10 09:02 keltezéssel, Boszormenyi Zoltan írta: > 2012-04-06 14:47 keltezéssel, Cousin Marc írta: >> On 05/04/12 08:02, Boszormenyi Zoltan wrote: >>> 2012-04-04 21:30 keltezéssel, Alvaro Herrera írta: >>>> I think this patch is doing two things: first touching infrastructure >>>> stuff and then adding lock_timeout on top of that. Would it work to >>>> split the patch in two pieces? >>>> >>> Sure. Attached is the split version. >>> >>> Best regards, >>> Zoltán Böszörményi >>> >> Hi, >> >> I've started looking at and testing both patches. >> >> Technically speaking, I think the source looks much better than the >> first version of lock timeout, and may help adding other timeouts in the >> future. I haven't tested it in depth though, because I encountered the >> following problem: >> >> While testing the patch, I found a way to crash PG. But what's weird is >> that it crashes also with an unpatched git version. >> >> Here is the way to reproduce it (I have done it with a pgbench schema): >> >> - Set a small statement_timeout (just to save time during the tests) >> >> Session1: >> =#BEGIN; >> =#lock TABLE pgbench_accounts ; >> >> Session 2: >> =#BEGIN; >> =#lock TABLE pgbench_accounts ; >> ERROR: canceling statement due to statement timeout >> =# lock TABLE pgbench_accounts ; >> >> I'm using \set ON_ERROR_ROLLBACK INTERACTIVE by the way. It can also be >> done with a rollback to savepoint of course. >> >> Session 2 crashes with this : TRAP : FailedAssertion(« >> !(locallock->holdsStrongLockCount == 0) », fichier : « lock.c », ligne : >> 749). >> >> It can also be done without a statement_timeout, and a control-C on the >> second lock table. >> >> I didn't touch anything but this. It occurs everytime, when asserts are >> activated. >> >> I tried it on 9.1.3, and I couldn't make it crash with the same sequence >> of events. So maybe it's something introduced since ? Or is the assert >> still valid ? >> >> Cheers >> > > Attached are the new patches. I rebased them to current GIT and > they are expected to be applied after Robert Haas' patch in the > "bug in fast-path locking" thread. > > Now it survives the above scenario. > > Best regards, > Zoltán Böszörményi New patch attached, rebased to today's GIT. Best regards, Zoltán Böszörményi -- ---------------------------------- Zoltán Böszörményi Cybertec Schönig& Schönig GmbH Gröhrmühlgasse 26 A-2700 Wiener Neustadt, Austria Web: http://www.postgresql-support.de http://www.postgresql.at/
Attachment
On Mon, 2012-04-23 at 10:53 +0200, Boszormenyi Zoltan wrote: > 2012-04-10 09:02 keltezéssel, Boszormenyi Zoltan írta: > > 2012-04-06 14:47 keltezéssel, Cousin Marc írta: > >> On 05/04/12 08:02, Boszormenyi Zoltan wrote: > >>> 2012-04-04 21:30 keltezéssel, Alvaro Herrera írta: > >>>> I think this patch is doing two things: first touching infrastructure > >>>> stuff and then adding lock_timeout on top of that. Would it work to > >>>> split the patch in two pieces? > >>>> > >>> Sure. Attached is the split version. > >>> > >>> Best regards, > >>> Zoltán Böszörményi > >>> > >> Hi, > >> > >> I've started looking at and testing both patches. > >> > >> Technically speaking, I think the source looks much better than the > >> first version of lock timeout, and may help adding other timeouts in the > >> future. I haven't tested it in depth though, because I encountered the > >> following problem: > >> > >> While testing the patch, I found a way to crash PG. But what's weird is > >> that it crashes also with an unpatched git version. > >> > >> Here is the way to reproduce it (I have done it with a pgbench schema): > >> > >> - Set a small statement_timeout (just to save time during the tests) > >> > >> Session1: > >> =#BEGIN; > >> =#lock TABLE pgbench_accounts ; > >> > >> Session 2: > >> =#BEGIN; > >> =#lock TABLE pgbench_accounts ; > >> ERROR: canceling statement due to statement timeout > >> =# lock TABLE pgbench_accounts ; > >> > >> I'm using \set ON_ERROR_ROLLBACK INTERACTIVE by the way. It can also be > >> done with a rollback to savepoint of course. > >> > >> Session 2 crashes with this : TRAP : FailedAssertion(« > >> !(locallock->holdsStrongLockCount == 0) », fichier : « lock.c », ligne : > >> 749). > >> > >> It can also be done without a statement_timeout, and a control-C on the > >> second lock table. > >> > >> I didn't touch anything but this. It occurs everytime, when asserts are > >> activated. > >> > >> I tried it on 9.1.3, and I couldn't make it crash with the same sequence > >> of events. So maybe it's something introduced since ? Or is the assert > >> still valid ? > >> > >> Cheers > >> > > > > Attached are the new patches. I rebased them to current GIT and > > they are expected to be applied after Robert Haas' patch in the > > "bug in fast-path locking" thread. > > > > Now it survives the above scenario. > > > > Best regards, > > Zoltán Böszörményi > > New patch attached, rebased to today's GIT. > > Best regards, > Zoltán Böszörményi > Ok, I've done what was missing from the review (from when I had a bug in locking the other day), so here is the full review. By the way, this patch doesn't belong to current commitfest, but to the next one. Is the patch in context diff format? Yes Does it apply cleanly to the current git master? Yes Does it include reasonable tests, necessary doc patches, etc? The new lock_timeout GUC is documented. There are regression tests. Read what the patch is supposed to do, and consider: Does the patch actually implement that? Yes Do we want that? I do. Mostly for administrative jobs which could lock the whole application. It would be much easier to run reindexes, vacuum full, etc… without worrying about bringing application down because of lock contention. Do we already have it? No. Does it follow SQL spec, or the community-agreed behavior? I don't know if there is a consensus on this new GUC. statement_timeout is obviously not in the SQL spec. Does it include pg_dump support (if applicable)? Not applicable Are there dangers? Yes, as it rewrites all the timeout code. I feel it is much cleaner this way, as there is a generic set of function managing all sigalarm code, but it heavily touches this part. Have all the bases been covered? I tried all sql statements I could think of (select, insert, update, delete, truncate, drop, create index, adding constraint, lock. I tried having statement_timeout, lock_timeout and deadlock_timeout at very short and close or equal values. It worked too. Rollback to savepoint while holding locks dont crash PostgreSQL anymore. Other timeouts such as archive_timeout and checkpoint_timeout still work. Does the feature work as advertised? Yes Are there corner cases the author has failed to consider? I didn't find any. Are there any assertion failures or crashes? No. Does the patch slow down simple tests? No If it claims to improve performance, does it? Not applicable Does it slow down other things? No Does it follow the project coding guidelines? I think so Are there portability issues? No, all the portable code (acquiring locks and manipulating sigalarm) is the same as before. Will it work on Windows/BSD etc? It should. I couldn't test it though. Are the comments sufficient and accurate? Yes Does it do what it says, correctly? Yes Does it produce compiler warnings? No Can you make it crash? Not anymore Is everything done in a way that fits together coherently with other features/modules? Yes, I think so. The new way of handling sigalarm seems more robust to me. Are there interdependencies that can cause problems? I don't see any. Regards, Marc
2012-04-23 15:08 keltezéssel, Marc Cousin írta: > On Mon, 2012-04-23 at 10:53 +0200, Boszormenyi Zoltan wrote: >> 2012-04-10 09:02 keltezéssel, Boszormenyi Zoltan írta: >>> 2012-04-06 14:47 keltezéssel, Cousin Marc írta: >>>> On 05/04/12 08:02, Boszormenyi Zoltan wrote: >>>>> 2012-04-04 21:30 keltezéssel, Alvaro Herrera írta: >>>>>> I think this patch is doing two things: first touching infrastructure >>>>>> stuff and then adding lock_timeout on top of that. Would it work to >>>>>> split the patch in two pieces? >>>>>> >>>>> Sure. Attached is the split version. >>>>> >>>>> Best regards, >>>>> Zoltán Böszörményi >>>>> >>>> Hi, >>>> >>>> I've started looking at and testing both patches. >>>> >>>> Technically speaking, I think the source looks much better than the >>>> first version of lock timeout, and may help adding other timeouts in the >>>> future. I haven't tested it in depth though, because I encountered the >>>> following problem: >>>> >>>> While testing the patch, I found a way to crash PG. But what's weird is >>>> that it crashes also with an unpatched git version. >>>> >>>> Here is the way to reproduce it (I have done it with a pgbench schema): >>>> >>>> - Set a small statement_timeout (just to save time during the tests) >>>> >>>> Session1: >>>> =#BEGIN; >>>> =#lock TABLE pgbench_accounts ; >>>> >>>> Session 2: >>>> =#BEGIN; >>>> =#lock TABLE pgbench_accounts ; >>>> ERROR: canceling statement due to statement timeout >>>> =# lock TABLE pgbench_accounts ; >>>> >>>> I'm using \set ON_ERROR_ROLLBACK INTERACTIVE by the way. It can also be >>>> done with a rollback to savepoint of course. >>>> >>>> Session 2 crashes with this : TRAP : FailedAssertion(« >>>> !(locallock->holdsStrongLockCount == 0) », fichier : « lock.c », ligne : >>>> 749). >>>> >>>> It can also be done without a statement_timeout, and a control-C on the >>>> second lock table. >>>> >>>> I didn't touch anything but this. It occurs everytime, when asserts are >>>> activated. >>>> >>>> I tried it on 9.1.3, and I couldn't make it crash with the same sequence >>>> of events. So maybe it's something introduced since ? Or is the assert >>>> still valid ? >>>> >>>> Cheers >>>> >>> Attached are the new patches. I rebased them to current GIT and >>> they are expected to be applied after Robert Haas' patch in the >>> "bug in fast-path locking" thread. >>> >>> Now it survives the above scenario. >>> >>> Best regards, >>> Zoltán Böszörményi >> New patch attached, rebased to today's GIT. >> >> Best regards, >> Zoltán Böszörményi >> > Ok, I've done what was missing from the review (from when I had a bug in > locking the other day), so here is the full review. By the way, this > patch doesn't belong to current commitfest, but to the next one. It was added to 2012-Next when I posted it, 2012-01 was already closed for new additions. > Is the patch in context diff format? > Yes > > Does it apply cleanly to the current git master? > Yes > > Does it include reasonable tests, necessary doc patches, etc? > The new lock_timeout GUC is documented. There are regression tests. > > Read what the patch is supposed to do, and consider: > Does the patch actually implement that? > Yes > > Do we want that? > I do. Mostly for administrative jobs which could lock the whole > application. It would be much easier to run reindexes, vacuum full, etc… > without worrying about bringing application down because of lock > contention. > > Do we already have it? > No. > > Does it follow SQL spec, or the community-agreed behavior? > I don't know if there is a consensus on this new GUC. statement_timeout > is obviously not in the SQL spec. > > Does it include pg_dump support (if applicable)? > Not applicable > > Are there dangers? > Yes, as it rewrites all the timeout code. I feel it is much cleaner this > way, as there is a generic set of function managing all sigalarm code, > but it heavily touches this part. > > Have all the bases been covered? > I tried all sql statements I could think of (select, insert, update, > delete, truncate, drop, create index, adding constraint, lock. > > I tried having statement_timeout, lock_timeout and deadlock_timeout at > very short and close or equal values. It worked too. > > Rollback to savepoint while holding locks dont crash PostgreSQL anymore. > > Other timeouts such as archive_timeout and checkpoint_timeout still > work. > > Does the feature work as advertised? > Yes > > Are there corner cases the author has failed to consider? > I didn't find any. > > Are there any assertion failures or crashes? > No. > > Does the patch slow down simple tests? > No > > If it claims to improve performance, does it? > Not applicable > > Does it slow down other things? > No > > Does it follow the project coding guidelines? > I think so > > Are there portability issues? > No, all the portable code (acquiring locks and manipulating sigalarm) is > the same as before. > > Will it work on Windows/BSD etc? > It should. I couldn't test it though. > > Are the comments sufficient and accurate? > Yes > > Does it do what it says, correctly? > Yes > > Does it produce compiler warnings? > No > > Can you make it crash? > Not anymore > > Is everything done in a way that fits together coherently with other > features/modules? > Yes, I think so. The new way of handling sigalarm seems more robust to > me. > > Are there interdependencies that can cause problems? > I don't see any. > > Regards, > > Marc Thanks for the review. Best regards, Zoltán Böszörményi -- ---------------------------------- Zoltán Böszörményi Cybertec Schönig& Schönig GmbH Gröhrmühlgasse 26 A-2700 Wiener Neustadt, Austria Web: http://www.postgresql-support.de http://www.postgresql.at/
Hi, 2012-04-24 09:00 keltezéssel, Boszormenyi Zoltan írta: > 2012-04-23 15:08 keltezéssel, Marc Cousin írta: >> On Mon, 2012-04-23 at 10:53 +0200, Boszormenyi Zoltan wrote: >>> 2012-04-10 09:02 keltezéssel, Boszormenyi Zoltan írta: >>>> 2012-04-06 14:47 keltezéssel, Cousin Marc írta: >>>>> On 05/04/12 08:02, Boszormenyi Zoltan wrote: >>>>>> 2012-04-04 21:30 keltezéssel, Alvaro Herrera írta: >>>>>>> I think this patch is doing two things: first touching infrastructure >>>>>>> stuff and then adding lock_timeout on top of that. Would it work to >>>>>>> split the patch in two pieces? >>>>>>> >>>>>> Sure. Attached is the split version. >>>>>> >>>>>> Best regards, >>>>>> Zoltán Böszörményi >>>>>> >>>>> Hi, >>>>> >>>>> I've started looking at and testing both patches. >>>>> >>>>> Technically speaking, I think the source looks much better than the >>>>> first version of lock timeout, and may help adding other timeouts in the >>>>> future. I haven't tested it in depth though, because I encountered the >>>>> following problem: >>>>> >>>>> While testing the patch, I found a way to crash PG. But what's weird is >>>>> that it crashes also with an unpatched git version. >>>>> >>>>> Here is the way to reproduce it (I have done it with a pgbench schema): >>>>> >>>>> - Set a small statement_timeout (just to save time during the tests) >>>>> >>>>> Session1: >>>>> =#BEGIN; >>>>> =#lock TABLE pgbench_accounts ; >>>>> >>>>> Session 2: >>>>> =#BEGIN; >>>>> =#lock TABLE pgbench_accounts ; >>>>> ERROR: canceling statement due to statement timeout >>>>> =# lock TABLE pgbench_accounts ; >>>>> >>>>> I'm using \set ON_ERROR_ROLLBACK INTERACTIVE by the way. It can also be >>>>> done with a rollback to savepoint of course. >>>>> >>>>> Session 2 crashes with this : TRAP : FailedAssertion(« >>>>> !(locallock->holdsStrongLockCount == 0) », fichier : « lock.c », ligne : >>>>> 749). >>>>> >>>>> It can also be done without a statement_timeout, and a control-C on the >>>>> second lock table. >>>>> >>>>> I didn't touch anything but this. It occurs everytime, when asserts are >>>>> activated. >>>>> >>>>> I tried it on 9.1.3, and I couldn't make it crash with the same sequence >>>>> of events. So maybe it's something introduced since ? Or is the assert >>>>> still valid ? >>>>> >>>>> Cheers >>>>> >>>> Attached are the new patches. I rebased them to current GIT and >>>> they are expected to be applied after Robert Haas' patch in the >>>> "bug in fast-path locking" thread. >>>> >>>> Now it survives the above scenario. >>>> >>>> Best regards, >>>> Zoltán Böszörményi >>> New patch attached, rebased to today's GIT. >>> >>> Best regards, >>> Zoltán Böszörményi >>> >> Ok, I've done what was missing from the review (from when I had a bug in >> locking the other day), so here is the full review. By the way, this >> patch doesn't belong to current commitfest, but to the next one. > > It was added to 2012-Next when I posted it, 2012-01 was already > closed for new additions. > >> Is the patch in context diff format? >> Yes >> >> Does it apply cleanly to the current git master? >> Yes >> >> Does it include reasonable tests, necessary doc patches, etc? >> The new lock_timeout GUC is documented. There are regression tests. >> >> Read what the patch is supposed to do, and consider: >> Does the patch actually implement that? >> Yes >> >> Do we want that? >> I do. Mostly for administrative jobs which could lock the whole >> application. It would be much easier to run reindexes, vacuum full, etc… >> without worrying about bringing application down because of lock >> contention. >> >> Do we already have it? >> No. >> >> Does it follow SQL spec, or the community-agreed behavior? >> I don't know if there is a consensus on this new GUC. statement_timeout >> is obviously not in the SQL spec. >> >> Does it include pg_dump support (if applicable)? >> Not applicable >> >> Are there dangers? >> Yes, as it rewrites all the timeout code. I feel it is much cleaner this >> way, as there is a generic set of function managing all sigalarm code, >> but it heavily touches this part. >> >> Have all the bases been covered? >> I tried all sql statements I could think of (select, insert, update, >> delete, truncate, drop, create index, adding constraint, lock. >> >> I tried having statement_timeout, lock_timeout and deadlock_timeout at >> very short and close or equal values. It worked too. >> >> Rollback to savepoint while holding locks dont crash PostgreSQL anymore. >> >> Other timeouts such as archive_timeout and checkpoint_timeout still >> work. >> >> Does the feature work as advertised? >> Yes >> >> Are there corner cases the author has failed to consider? >> I didn't find any. >> >> Are there any assertion failures or crashes? >> No. >> >> Does the patch slow down simple tests? >> No >> >> If it claims to improve performance, does it? >> Not applicable >> >> Does it slow down other things? >> No >> >> Does it follow the project coding guidelines? >> I think so >> >> Are there portability issues? >> No, all the portable code (acquiring locks and manipulating sigalarm) is >> the same as before. >> >> Will it work on Windows/BSD etc? >> It should. I couldn't test it though. >> >> Are the comments sufficient and accurate? >> Yes >> >> Does it do what it says, correctly? >> Yes >> >> Does it produce compiler warnings? >> No >> >> Can you make it crash? >> Not anymore >> >> Is everything done in a way that fits together coherently with other >> features/modules? >> Yes, I think so. The new way of handling sigalarm seems more robust to >> me. >> >> Are there interdependencies that can cause problems? >> I don't see any. >> >> Regards, >> >> Marc > > Thanks for the review. > > Best regards, > Zoltán Böszörményi > New patches are attached, rebased to current GIT to avoid a (trivial) reject. No other changes. Best regards, Zoltán Böszörményi -- ---------------------------------- Zoltán Böszörményi Cybertec Schönig& Schönig GmbH Gröhrmühlgasse 26 A-2700 Wiener Neustadt, Austria Web: http://www.postgresql-support.de http://www.postgresql.at/
Attachment
Hi, another rebasing and applied the GIT changes in ada8fa08fc6cf5f199b6df935b4d0a730aaa4fec to the Windows implementation of PGSemaphoreTimedLock. Best regards, Zoltán Böszörményi -- ---------------------------------- Zoltán Böszörményi Cybertec Schönig& Schönig GmbH Gröhrmühlgasse 26 A-2700 Wiener Neustadt, Austria Web: http://www.postgresql-support.de http://www.postgresql.at/
Attachment
Excerpts from Boszormenyi Zoltan's message of vie may 11 03:54:13 -0400 2012: > Hi, > > another rebasing and applied the GIT changes in > ada8fa08fc6cf5f199b6df935b4d0a730aaa4fec to the > Windows implementation of PGSemaphoreTimedLock. Hi, I gave the framework patch a look. One thing I'm not sure about is the way you've defined the API. It seems a bit strange to have a nice and clean, generic interface in timeout.h; and then have the internal implementation of the API cluttered with details such as what to do when the deadlock timeout expires. Wouldn't it be much better if we could keep the details of CheckDeadLock itself within proc.c, for example? Same for the other specific Check*Timeout functions. It seems to me that the only timeout.c specific requirement is to be able to poke base_timeouts[].indicator and fin_time. Maybe that can get done in timeout.c and then have the generic checker call the module-specific checker. ... I see that you have things to do before and after setting "indicator". Maybe you could split the module-specific Check functions in two and have timeout.c call each in turn. Other than that I don't see that this should pose any difficulty. Also, I see you're calling GetCurrentTimestamp() in the checkers; maybe more than once per signal if you have multiple timeouts enabled. Maybe it'd be better to call it in once handle_sig_alarm and then pass the value down, if necessary (AFAICS if you restructure the code as I suggest above, you don't need to get the value down the the module-specific code). As for the Init*Timeout() and Destroy*Timeout() functions, they seem a bit pointless -- I mean if they just always call the generic InitTimeout and DestroyTimeout functions, why not just define the struct in a way that makes the specific functions unnecessary? Maybe you even already have all that's necessary ... I think you just change base_timeouts[i].timeout_destroy(false); to DestroyTimeout(i, false) and so on. Minor nitpick: the "Interface:" comment in timeout.c seems useless. That kind of thing tends to get overlooked and obsolete over time. We have examples of such things all over the place. I'd just rip it and have timeout.h be the interface documentation. -- Álvaro Herrera <alvherre@commandprompt.com> The PostgreSQL Company - Command Prompt, Inc. PostgreSQL Replication, Consulting, Custom Development, 24x7 support
BTW it doesn't seem that datatype/timestamp.h is really necessary in timeout.h. -- Álvaro Herrera <alvherre@commandprompt.com> The PostgreSQL Company - Command Prompt, Inc. PostgreSQL Replication, Consulting, Custom Development, 24x7 support
2012-06-18 19:46 keltezéssel, Alvaro Herrera írta: > Excerpts from Boszormenyi Zoltan's message of vie may 11 03:54:13 -0400 2012: >> Hi, >> >> another rebasing and applied the GIT changes in >> ada8fa08fc6cf5f199b6df935b4d0a730aaa4fec to the >> Windows implementation of PGSemaphoreTimedLock. > Hi, > > I gave the framework patch a look. One thing I'm not sure about is the > way you've defined the API. It seems a bit strange to have a nice and > clean, generic interface in timeout.h; and then have the internal > implementation of the API cluttered with details such as what to do when > the deadlock timeout expires. Wouldn't it be much better if we could > keep the details of CheckDeadLock itself within proc.c, for example? Do you mean adding a callback function argument to for enable_timeout() would be better? > Same for the other specific Check*Timeout functions. It seems to me > that the only timeout.c specific requirement is to be able to poke > base_timeouts[].indicator and fin_time. Maybe that can get done in > timeout.c and then have the generic checker call the module-specific > checker. Or instead of static functions, Check* functions can be external to timeout.c. It seemed to be a good idea to move all the timeout related functions to timeout.c. > ... I see that you have things to do before and after setting > "indicator". Maybe you could split the module-specific Check functions > in two and have timeout.c call each in turn. Other than that I don't > see that this should pose any difficulty. > > Also, I see you're calling GetCurrentTimestamp() in the checkers; maybe > more than once per signal if you have multiple timeouts enabled. Actually, GetCurrentTimestamp() is not called multiple times, because only the first timeout source in the queue can get triggered. > Maybe > it'd be better to call it in once handle_sig_alarm and then pass the > value down, if necessary (AFAICS if you restructure the code as I > suggest above, you don't need to get the value down the the > module-specific code). But yes, this way it can be cleaner. > > As for the Init*Timeout() and Destroy*Timeout() functions, they seem > a bit pointless -- I mean if they just always call the generic > InitTimeout and DestroyTimeout functions, why not just define the struct > in a way that makes the specific functions unnecessary? Maybe you even > already have all that's necessary ... I think you just change > base_timeouts[i].timeout_destroy(false); to DestroyTimeout(i, false) and > so on. OK, I will experiment with it. > Minor nitpick: the "Interface:" comment in timeout.c seems useless. > That kind of thing tends to get overlooked and obsolete over time. We > have examples of such things all over the place. I'd just rip it and > have timeout.h be the interface documentation. OK. > BTW it doesn't seem that datatype/timestamp.h is really necessary in > timeout.h. You are wrong. get_timeout_start() returns TimestampTz and it's defined in datatype/timestamp.h. Thanks for the review, I will send the new code soon. Best regards, Zoltán Böszörményi -- ---------------------------------- Zoltán Böszörményi Cybertec Schönig & Schönig GmbH Gröhrmühlgasse 26 A-2700 Wiener Neustadt, Austria Web: http://www.postgresql-support.de http://www.postgresql.at/
Hi, attached are the new patches. The base patch is now simplified and is about 10K smaller: - no more individual Init* and Destroy* functions - Check* functions don't check for "now", it's done centrally by the signal handler - CheckDeadLock() is moved back to proc.c and made public for timeout.c - a new header storage/proctimeout.h is introduced, so timeout.c can know about CheckDeadLock() The lock_timeout patch gained a new feature and enum GUC: SET lock_timeout_option = { 'per_lock' | 'per_statement' } ; 'per_lock' is the default and carries the previous behaviour: the timeout value applies to all locks individually. The statement may take up to N*timeout. 'per_statement' behaves like statement_timeout. The important difference is that statement_timeout may trigger during the executor phase when a long result set is already being returned to the client and the transfer is cut because of the timeout. On the other hand, lock_timeout may only trigger during locking the objects and if all locks were granted under the specified time, the result set is then returned to the client. Best regards, Zoltán Böszörményi -- ---------------------------------- Zoltán Böszörményi Cybertec Schönig & Schönig GmbH Gröhrmühlgasse 26 A-2700 Wiener Neustadt, Austria Web: http://www.postgresql-support.de http://www.postgresql.at/
Attachment
Excerpts from Boszormenyi Zoltan's message of mar jun 19 04:44:35 -0400 2012: > > 2012-06-18 19:46 keltezéssel, Alvaro Herrera írta: > > Excerpts from Boszormenyi Zoltan's message of vie may 11 03:54:13 -0400 2012: > >> Hi, > >> > >> another rebasing and applied the GIT changes in > >> ada8fa08fc6cf5f199b6df935b4d0a730aaa4fec to the > >> Windows implementation of PGSemaphoreTimedLock. > > Hi, > > > > I gave the framework patch a look. One thing I'm not sure about is the > > way you've defined the API. It seems a bit strange to have a nice and > > clean, generic interface in timeout.h; and then have the internal > > implementation of the API cluttered with details such as what to do when > > the deadlock timeout expires. Wouldn't it be much better if we could > > keep the details of CheckDeadLock itself within proc.c, for example? > > Do you mean adding a callback function argument to for enable_timeout() > would be better? Well, that's not exactly what I was thinking, but maybe your idea is better than what I had in mind. > > Same for the other specific Check*Timeout functions. It seems to me > > that the only timeout.c specific requirement is to be able to poke > > base_timeouts[].indicator and fin_time. Maybe that can get done in > > timeout.c and then have the generic checker call the module-specific > > checker. > > Or instead of static functions, Check* functions can be external > to timeout.c. It seemed to be a good idea to move all the timeout > related functions to timeout.c. Yeah, except that they play with members of the timeout_params struct, which hopefully does not need to be exported. So if you can just move (that is, put back in their original places) the portions that do not touch that strcut, it'd be great. > > Also, I see you're calling GetCurrentTimestamp() in the checkers; maybe > > more than once per signal if you have multiple timeouts enabled. > > Actually, GetCurrentTimestamp() is not called multiple times, > because only the first timeout source in the queue can get triggered. I see. > > BTW it doesn't seem that datatype/timestamp.h is really necessary in > > timeout.h. > > You are wrong. get_timeout_start() returns TimestampTz and it's > defined in datatype/timestamp.h. Oh, right. -- Álvaro Herrera <alvherre@commandprompt.com> The PostgreSQL Company - Command Prompt, Inc. PostgreSQL Replication, Consulting, Custom Development, 24x7 support
2012-06-19 16:54 keltezéssel, Alvaro Herrera írta: > Excerpts from Boszormenyi Zoltan's message of mar jun 19 04:44:35 -0400 2012: >> 2012-06-18 19:46 keltezéssel, Alvaro Herrera írta: >>> Excerpts from Boszormenyi Zoltan's message of vie may 11 03:54:13 -0400 2012: >>>> Hi, >>>> >>>> another rebasing and applied the GIT changes in >>>> ada8fa08fc6cf5f199b6df935b4d0a730aaa4fec to the >>>> Windows implementation of PGSemaphoreTimedLock. >>> Hi, >>> >>> I gave the framework patch a look. One thing I'm not sure about is the >>> way you've defined the API. It seems a bit strange to have a nice and >>> clean, generic interface in timeout.h; and then have the internal >>> implementation of the API cluttered with details such as what to do when >>> the deadlock timeout expires. Wouldn't it be much better if we could >>> keep the details of CheckDeadLock itself within proc.c, for example? >> Do you mean adding a callback function argument to for enable_timeout() >> would be better? > Well, that's not exactly what I was thinking, but maybe your idea is > better than what I had in mind. > >>> Same for the other specific Check*Timeout functions. It seems to me >>> that the only timeout.c specific requirement is to be able to poke >>> base_timeouts[].indicator and fin_time. Maybe that can get done in >>> timeout.c and then have the generic checker call the module-specific >>> checker. >> Or instead of static functions, Check* functions can be external >> to timeout.c. It seemed to be a good idea to move all the timeout >> related functions to timeout.c. > Yeah, except that they play with members of the timeout_params struct, > which hopefully does not need to be exported. So if you can just move > (that is, put back in their original places) the portions that do not > touch that strcut, it'd be great. OK, all 4 Check* functions are now moved back into proc.c, nothing outside of timeout.c touches anything in it. New patches are attached. -- ---------------------------------- Zoltán Böszörményi Cybertec Schönig & Schönig GmbH Gröhrmühlgasse 26 A-2700 Wiener Neustadt, Austria Web: http://www.postgresql-support.de http://www.postgresql.at/
Attachment
2012-06-19 18:44 keltezéssel, Boszormenyi Zoltan írta: > 2012-06-19 16:54 keltezéssel, Alvaro Herrera írta: >> Excerpts from Boszormenyi Zoltan's message of mar jun 19 04:44:35 -0400 2012: >>> 2012-06-18 19:46 keltezéssel, Alvaro Herrera írta: >>>> Excerpts from Boszormenyi Zoltan's message of vie may 11 03:54:13 -0400 2012: >>>>> Hi, >>>>> >>>>> another rebasing and applied the GIT changes in >>>>> ada8fa08fc6cf5f199b6df935b4d0a730aaa4fec to the >>>>> Windows implementation of PGSemaphoreTimedLock. >>>> Hi, >>>> >>>> I gave the framework patch a look. One thing I'm not sure about is the >>>> way you've defined the API. It seems a bit strange to have a nice and >>>> clean, generic interface in timeout.h; and then have the internal >>>> implementation of the API cluttered with details such as what to do when >>>> the deadlock timeout expires. Wouldn't it be much better if we could >>>> keep the details of CheckDeadLock itself within proc.c, for example? >>> Do you mean adding a callback function argument to for enable_timeout() >>> would be better? >> Well, that's not exactly what I was thinking, but maybe your idea is >> better than what I had in mind. >> >>>> Same for the other specific Check*Timeout functions. It seems to me >>>> that the only timeout.c specific requirement is to be able to poke >>>> base_timeouts[].indicator and fin_time. Maybe that can get done in >>>> timeout.c and then have the generic checker call the module-specific >>>> checker. >>> Or instead of static functions, Check* functions can be external >>> to timeout.c. It seemed to be a good idea to move all the timeout >>> related functions to timeout.c. >> Yeah, except that they play with members of the timeout_params struct, >> which hopefully does not need to be exported. So if you can just move >> (that is, put back in their original places) the portions that do not >> touch that strcut, it'd be great. > > OK, all 4 Check* functions are now moved back into proc.c, > nothing outside of timeout.c touches anything in it. New patches > are attached. OK, one last post for today. The #included headers are cleaned up in timeout.c, miscadmin.h and storage/standby.h are not needed. -- ---------------------------------- Zoltán Böszörményi Cybertec Schönig & Schönig GmbH Gröhrmühlgasse 26 A-2700 Wiener Neustadt, Austria Web: http://www.postgresql-support.de http://www.postgresql.at/
Attachment
Excerpts from Boszormenyi Zoltan's message of mar jun 19 12:44:04 -0400 2012: > OK, all 4 Check* functions are now moved back into proc.c, > nothing outside of timeout.c touches anything in it. New patches > are attached. Yeah, I like this one better, thanks. It seems to me that the "check" functions are no longer "check" anymore, right? I mean, they don't check whether the time has expired. It can be argued that CheckDeadLock is well named, because it does check whether there is a deadlock before doing anything else; but CheckStandbyTimeout is no longer a check -- it just sends a signal. Do we need to rename these functions? Why are you using the deadlock timeout for authentication? Wouldn't it make more sense to have a separate TimeoutName, just to keep things clean? The "NB:" comment here doesn't seem to be useful anymore: + /***************************************************************************** + * Init, Destroy and Check functions for different timeouts. + * + * NB: all Check* functions are run inside a signal handler, so be very wary + * about what is done in them or in called routines. + *****************************************************************************/ In base_timeouts you don't initialize fin_time for any of the members. This is probably unimportant but then why initialize start_time? -- Álvaro Herrera <alvherre@commandprompt.com> The PostgreSQL Company - Command Prompt, Inc. PostgreSQL Replication, Consulting, Custom Development, 24x7 support
2012-06-19 19:19 keltezéssel, Alvaro Herrera írta: > Excerpts from Boszormenyi Zoltan's message of mar jun 19 12:44:04 -0400 2012: > >> OK, all 4 Check* functions are now moved back into proc.c, >> nothing outside of timeout.c touches anything in it. New patches >> are attached. > Yeah, I like this one better, thanks. Thanks. It seems I am on the right track then. :-) > It seems to me that the "check" functions are no longer "check" anymore, > right? I mean, they don't check whether the time has expired. It can > be argued that CheckDeadLock is well named, because it does check > whether there is a deadlock before doing anything else; but > CheckStandbyTimeout is no longer a check -- it just sends a signal. > Do we need to rename these functions? Well, maybe. How about *Function() instead of Check*()? > Why are you using the deadlock timeout for authentication? Because it originally also used the deadlock timeout. I agree that it was abusing code that's written for something else. > Wouldn't it > make more sense to have a separate TimeoutName, just to keep things > clean? Yes, sure. Can you tell me a way to test authentication timeout? "pre_auth_delay" is not good for testing it. Changing authentication_timeout to 1sec both in the GUC list and the initial value of it in postmaster.c plus adding this code below after enabling the auth timeout didn't help. ---8<------8<------8<------8<--- if (AuthenticationTimeout > 0) pg_usleep((AuthenticationTimeout + 2) * 1000000L); ---8<------8<------8<------8<--- I got this despite authentication_timeout being 1 second: ---8<------8<------8<------8<--- ============== removing existing temp installation ============== ============== creating temporary installation ============== ============== initializing database system ============== ============== starting postmaster ============== pg_regress: postmaster did not respond within 60 seconds Examine .../postgresql.1/src/test/regress/log/postmaster.log for the reason ---8<------8<------8<------8<--- Anyway, it doesn't seem to me that the code after enabling auth timeout actually uses anything from the timeout code but the side effect of a signal interrupting read() on the client connection socket and logs "incomplete startup packet" or "invalid length of startup packet". If that's true, then it can use its own (dummy) timeout and I modified postmaster.c and timeout.c accordingly. It survives "make check". > The "NB:" comment here doesn't seem to be useful anymore: > + /***************************************************************************** > + * Init, Destroy and Check functions for different timeouts. > + * > + * NB: all Check* functions are run inside a signal handler, so be very wary > + * about what is done in them or in called routines. > + *****************************************************************************/ I removed it. > In base_timeouts you don't initialize fin_time for any of the members. > This is probably unimportant but then why initialize start_time? The answer is "out of habit" and "leftover". But now the checks are only done for active timeouts, I think neither of these are needed to be initialized now. Thanks for spotting it. It survives "make check". Attached are the new patches with these changes. -- ---------------------------------- Zoltán Böszörményi Cybertec Schönig & Schönig GmbH Gröhrmühlgasse 26 A-2700 Wiener Neustadt, Austria Web: http://www.postgresql-support.de http://www.postgresql.at/
Attachment
I cleaned up the framework patch a bit. My version's attached. Mainly, returning false for failure in some code paths that are only going to have the caller elog(FATAL) is rather pointless -- it seems much better to just have the code itself do the elog(FATAL). In any case, I searched for reports of those error messages being reported in the wild and saw none. There are other things but they are in the nitpick department. (A reference to "->check_timeout" remains that needs to be fixed too). There is one thing that still bothers me on this whole framework patch, but I'm not sure it's easily fixable. Basically the API to timeout.c is the whole list of timeouts and their whole behaviors. If you want to add a new one, you can't just call into the entry points, you have to modify timeout.c itself (as well as timeout.h as well as whatever code it is that you want to add timeouts to). This may be good enough, but I don't like it. I think it boils down to proctimeout.h is cheating. The interface I would actually like to have is something that lets the modules trying to get a timeout register the timeout-checking function as a callback. API-wise, it would be much cleaner. However, I'm not raelly sure that code-wise it would be any cleaner at all. In fact I think it'd be rather cumbersome. However, if you could give that idea some thought, it'd be swell. I haven't looked at the second patch at all yet. -- Álvaro Herrera <alvherre@commandprompt.com> The PostgreSQL Company - Command Prompt, Inc. PostgreSQL Replication, Consulting, Custom Development, 24x7 support
Attachment
2012-06-26 06:59 keltezéssel, Alvaro Herrera írta: > I cleaned up the framework patch a bit. My version's attached. Mainly, > returning false for failure in some code paths that are only going to > have the caller elog(FATAL) is rather pointless -- it seems much better > to just have the code itself do the elog(FATAL). In any case, I > searched for reports of those error messages being reported in the wild > and saw none. OK. The cleanups are always good. One nitpick, though. Your version doesn't contain the timeout.h and compilation fails because of it. :-) You might have forgotten to do "git commit -a". > There are other things but they are in the nitpick department. (A > reference to "->check_timeout" remains that needs to be fixed too). Yes, it's called ->timeout_func() now. > There is one thing that still bothers me on this whole framework patch, > but I'm not sure it's easily fixable. Basically the API to timeout.c is > the whole list of timeouts and their whole behaviors. If you want to > add a new one, you can't just call into the entry points, you have to > modify timeout.c itself (as well as timeout.h as well as whatever code > it is that you want to add timeouts to). This may be good enough, but I > don't like it. I think it boils down to proctimeout.h is cheating. > > The interface I would actually like to have is something that lets the > modules trying to get a timeout register the timeout-checking function > as a callback. API-wise, it would be much cleaner. However, I'm not > raelly sure that code-wise it would be any cleaner at all. In fact I > think it'd be rather cumbersome. However, if you could give that idea > some thought, it'd be swell. Well, I can make the registration interface similar to how LWLocks are treated, but that doesn't avoid modification of the base_timeouts array in case a new internal use case arises. Say: #define USER_TIMEOUTS 4 int n_timeouts = TIMEOUT_MAX; static timeout_params base_timeouts[TIMEOUT_MAX + USER_TIMEOUTS]; and register_timeout() adds data after TIMEOUT_MAX. > I haven't looked at the second patch at all yet. This is the whole point of the first patch. But as I said above for registering a new timeout source, it's a new internal use case. One that touches a lot of places which cannot simply be done by registering a new timeout source. -- ---------------------------------- Zoltán Böszörményi Cybertec Schönig & Schönig GmbH Gröhrmühlgasse 26 A-2700 Wiener Neustadt, Austria Web: http://www.postgresql-support.de http://www.postgresql.at/
On Tue, Jun 26, 2012 at 3:59 AM, Boszormenyi Zoltan <zb@cybertec.at> wrote: > Well, I can make the registration interface similar to how LWLocks > are treated, but that doesn't avoid modification of the base_timeouts > array in case a new internal use case arises. Say: > > #define USER_TIMEOUTS 4 > > int n_timeouts = TIMEOUT_MAX; > static timeout_params base_timeouts[TIMEOUT_MAX + USER_TIMEOUTS]; Since timeouts - unlike lwlocks - do not need to touch shared memory, there's no need for a hard-coded limit here. You can just allocate the array using MemoryContextAlloc(TopMemoryContext, ...) and enlarge it as necessary. To avoid needing to modify the base_timeouts array, you can just have internal callers push their entries into the array during process startup using the same function call that an external module would use. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
2012-06-26 13:50 keltezéssel, Robert Haas írta: > On Tue, Jun 26, 2012 at 3:59 AM, Boszormenyi Zoltan <zb@cybertec.at> wrote: >> Well, I can make the registration interface similar to how LWLocks >> are treated, but that doesn't avoid modification of the base_timeouts >> array in case a new internal use case arises. Say: >> >> #define USER_TIMEOUTS 4 >> >> int n_timeouts = TIMEOUT_MAX; >> static timeout_params base_timeouts[TIMEOUT_MAX + USER_TIMEOUTS]; > Since timeouts - unlike lwlocks - do not need to touch shared memory, > there's no need for a hard-coded limit here. You can just allocate > the array using MemoryContextAlloc(TopMemoryContext, ...) and enlarge > it as necessary. To avoid needing to modify the base_timeouts array, > you can just have internal callers push their entries into the array > during process startup using the same function call that an external > module would use. I know, but it doesn't feel right to "register" static functionality. -- ---------------------------------- Zoltán Böszörményi Cybertec Schönig & Schönig GmbH Gröhrmühlgasse 26 A-2700 Wiener Neustadt, Austria Web: http://www.postgresql-support.de http://www.postgresql.at/
On Tue, Jun 26, 2012 at 8:03 AM, Boszormenyi Zoltan <zb@cybertec.at> wrote: > I know, but it doesn't feel right to "register" static functionality. We do it elsewhere. The overhead is pretty minimal compared to other things we already do during startup, and avoiding the need for the array to have a fixed-size seems worth it, IMHO. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Robert Haas <robertmhaas@gmail.com> writes: > On Tue, Jun 26, 2012 at 8:03 AM, Boszormenyi Zoltan <zb@cybertec.at> wrote: >> I know, but it doesn't feel right to "register" static functionality. > We do it elsewhere. The overhead is pretty minimal compared to other > things we already do during startup, and avoiding the need for the > array to have a fixed-size seems worth it, IMHO. It's not even clear that the array needs to be dynamically resizable (yet). Compare for instance syscache invalidation callbacks, which have so far gotten along fine with a fixed-size array to hold registrations. It's foreseeable that we'll need something better someday, but that day hasn't arrived, and might never arrive. I agree with the feeling that this patch isn't done if the "core" timeout code has to know specifically about each usage. We have that situation already. regards, tom lane
Excerpts from Boszormenyi Zoltan's message of mar jun 26 03:59:06 -0400 2012: > > 2012-06-26 06:59 keltezéssel, Alvaro Herrera írta: > > I cleaned up the framework patch a bit. My version's attached. Mainly, > > returning false for failure in some code paths that are only going to > > have the caller elog(FATAL) is rather pointless -- it seems much better > > to just have the code itself do the elog(FATAL). In any case, I > > searched for reports of those error messages being reported in the wild > > and saw none. > > OK. The cleanups are always good. > > One nitpick, though. Your version doesn't contain the timeout.h > and compilation fails because of it. :-) You might have forgotten > to do "git commit -a". Oops. Attached. It's pretty much the same you had, except some "bools" are changed to void. > > There is one thing that still bothers me on this whole framework patch, > > but I'm not sure it's easily fixable. Basically the API to timeout.c is > > the whole list of timeouts and their whole behaviors. If you want to > > add a new one, you can't just call into the entry points, you have to > > modify timeout.c itself (as well as timeout.h as well as whatever code > > it is that you want to add timeouts to). This may be good enough, but I > > don't like it. I think it boils down to proctimeout.h is cheating. > > > > The interface I would actually like to have is something that lets the > > modules trying to get a timeout register the timeout-checking function > > as a callback. API-wise, it would be much cleaner. However, I'm not > > raelly sure that code-wise it would be any cleaner at all. In fact I > > think it'd be rather cumbersome. However, if you could give that idea > > some thought, it'd be swell. > > Well, I can make the registration interface similar to how LWLocks > are treated, but that doesn't avoid modification of the base_timeouts > array in case a new internal use case arises. Say: > > #define USER_TIMEOUTS 4 > > int n_timeouts = TIMEOUT_MAX; > static timeout_params base_timeouts[TIMEOUT_MAX + USER_TIMEOUTS]; > > and register_timeout() adds data after TIMEOUT_MAX. Well, I don't expect that we're going to have many external uses. My point about registration is so that internal callers use it as well. I was thinking we could do something like xact.c does, which is to have somewhere in proc.c a call like this: TimeoutRegister(DEADLOCK_TIMEOUT, 1, true, CheckDeadLock, GetCurrentTimestamp); at process startup (the magic value 1 is the priority. Maybe there's a better way to handle this). That way, timeout.c or timeout.h do not need to know anything about proc.c at all; we don't need proctimeout.h at all. The only thing it (the timeout module) needs to know is that there is a symbolic constant named DEADLOCK_TIMEOUT. Similarly for statement timeout, etc. When you call enable_timeout you first have to ensure that DEADLOCK_TIMEOUT has been registered; and if it's not, die on an Assert(). That way you ensure that there are no bugs where you try to enable a timeout that hasn't been registered. (If we later find the need to let external modules add timeouts, which I find unlikely, we can have TimeoutRegister return TimeoutName and have this value be dynamically allocated. But for now I think that would be useless complication). The difference with lwlocks is that each lwlock is just a number. The lwlock.c module doesn't need to know anything about how each lwlock is actually used. It's not the same thing here -- which is why I think it would be better to have all modules, even the hardcoded ones, use a registering interface. ... Now, it could be argued that it would be even better to have TimeoutRegister not take the TimeoutName argument, and instead generate it dynamically and pass it back to the caller -- that way you don't need the enum in timeout.h. The problem I have with that idea is that it would force the caller modules to have a global variable to keep track of that, which seems worse to me. > > I haven't looked at the second patch at all yet. > > This is the whole point of the first patch. I know that. But I want to get the details of the framework right before we move on to add more stuff to it. > But as I said above for > registering a new timeout source, it's a new internal use case. > One that touches a lot of places which cannot simply be done > by registering a new timeout source. Sure. That's going to be the case for any other timeout we want to add (which is why I think we don't really need dynamic timeouts). -- Álvaro Herrera <alvherre@commandprompt.com> The PostgreSQL Company - Command Prompt, Inc. PostgreSQL Replication, Consulting, Custom Development, 24x7 support
Attachment
<div class="moz-cite-prefix">2012-06-26 18:12 keltezéssel, Alvaro Herrera írta:<br /></div><blockquote cite="mid:1340726100-sup-3131@alvh.no-ip.org"type="cite"><pre wrap="">Excerpts from Boszormenyi Zoltan's message of mar jun26 03:59:06 -0400 2012: </pre><blockquote type="cite"><pre wrap=""> 2012-06-26 06:59 keltezéssel, Alvaro Herrera írta: </pre><blockquote type="cite"><pre wrap="">I cleaned up the framework patch a bit. My version's attached. Mainly, returning false for failure in some code paths that are only going to have the caller elog(FATAL) is rather pointless -- it seems much better to just have the code itself do the elog(FATAL). In any case, I searched for reports of those error messages being reported in the wild and saw none. </pre></blockquote><pre wrap=""> OK. The cleanups are always good. One nitpick, though. Your version doesn't contain the timeout.h and compilation fails because of it. :-) You might have forgotten to do "git commit -a". </pre></blockquote><pre wrap=""> Oops. Attached. It's pretty much the same you had, except some "bools" are changed to void. </pre><blockquote type="cite"><blockquote type="cite"><pre wrap="">There is one thing that still bothers me on this wholeframework patch, but I'm not sure it's easily fixable. Basically the API to timeout.c is the whole list of timeouts and their whole behaviors. If you want to add a new one, you can't just call into the entry points, you have to modify timeout.c itself (as well as timeout.h as well as whatever code it is that you want to add timeouts to). This may be good enough, but I don't like it. I think it boils down to proctimeout.h is cheating. The interface I would actually like to have is something that lets the modules trying to get a timeout register the timeout-checking function as a callback. API-wise, it would be much cleaner. However, I'm not raelly sure that code-wise it would be any cleaner at all. In fact I think it'd be rather cumbersome. However, if you could give that idea some thought, it'd be swell. </pre></blockquote><pre wrap=""> Well, I can make the registration interface similar to how LWLocks are treated, but that doesn't avoid modification of the base_timeouts array in case a new internal use case arises. Say: #define USER_TIMEOUTS 4 int n_timeouts = TIMEOUT_MAX; static timeout_params base_timeouts[TIMEOUT_MAX + USER_TIMEOUTS]; and register_timeout() adds data after TIMEOUT_MAX. </pre></blockquote><pre wrap=""> Well, I don't expect that we're going to have many external uses. My point about registration is so that internal callers use it as well. I was thinking we could do something like xact.c does, which is to have somewhere in proc.c a call like this: TimeoutRegister(DEADLOCK_TIMEOUT, 1, true, CheckDeadLock, GetCurrentTimestamp); at process startup (the magic value 1 is the priority. Maybe there's a better way to handle this). That way, timeout.c or timeout.h do not need to know anything about proc.c at all; we don't need proctimeout.h at all. The only thing it (the timeout module) needs to know is that there is a symbolic constant named DEADLOCK_TIMEOUT. Similarly for statement timeout, etc. When you call enable_timeout you first have to ensure that DEADLOCK_TIMEOUT has been registered; and if it's not, die on an Assert(). That way you ensure that there are no bugs where you try to enable a timeout that hasn't been registered.</pre></blockquote><br /> Currently, TimeoutName (as an index value) and "priority"is the same<br /> semantically.<br /><br /> I would also add an Assert into register_timeout() to avoid doubleregistration<br /> of the same to prevent overriding he callback function pointer. If that's in<br /> place, the TimeoutNamevalue may still work as is: an index into base_timeouts[].<br /><br /><blockquote cite="mid:1340726100-sup-3131@alvh.no-ip.org"type="cite"><pre wrap="">(If we later find the need to let external modulesadd timeouts, which I find unlikely, we can have TimeoutRegister return TimeoutName and have this value be dynamically allocated. But for now I think that would be useless complication). The difference with lwlocks is that each lwlock is just a number.</pre></blockquote><br /> Strictly speaking, just as TimeoutName.<br/><br /><blockquote cite="mid:1340726100-sup-3131@alvh.no-ip.org" type="cite"><pre wrap=""> The lwlock.c module doesn't need to know anything about how each lwlock is actually used. It's not the same thing here -- which is why I think it would be better to have all modules, even the hardcoded ones, use a registering interface.</pre></blockquote><br /> OK.<br /><br /><blockquote cite="mid:1340726100-sup-3131@alvh.no-ip.org"type="cite"><pre wrap="">... Now, it could be argued that it would be even betterto have TimeoutRegister not take the TimeoutName argument, and instead generate it dynamically and pass it back to the caller -- that way you don't need the enum in timeout.h.</pre></blockquote><br /> This was what I had in mind at first ...<br /><br /><blockquote cite="mid:1340726100-sup-3131@alvh.no-ip.org"type="cite"><pre wrap=""> The problem I have with that idea is that it would force the caller modules to have a global variable to keep track of that, which seems worse to me.</pre></blockquote><br /> ... and realized this as well.<br /><br /> So, should I keep theenum TimeoutName? Are global variables for<br /> keeping dynamically assigned values preferred over the enum?<br /> Currentlywe have 5 timeout sources in total, 3 of them are used by<br /> regular backends, the remaining 2 are used by replicationstandby.<br /> We can have a fixed size array (say with 8 or 16 elements) for future use<br /> and this wouldbe plenty.<br /><br /> Opinions?<br /><br /><blockquote cite="mid:1340726100-sup-3131@alvh.no-ip.org" type="cite"><prewrap=""> </pre><blockquote type="cite"><blockquote type="cite"><pre wrap="">I haven't looked at the second patch at all yet. </pre></blockquote><pre wrap=""> This is the whole point of the first patch. </pre></blockquote><pre wrap=""> I know that. But I want to get the details of the framework right before we move on to add more stuff to it. </pre><blockquote type="cite"><pre wrap="">But as I said above for registering a new timeout source, it's a new internal use case. One that touches a lot of places which cannot simply be done by registering a new timeout source. </pre></blockquote><pre wrap=""> Sure. That's going to be the case for any other timeout we want to add (which is why I think we don't really need dynamic timeouts). </pre><br /><fieldset class="mimeAttachmentHeader"></fieldset><br /><pre wrap=""> </pre></blockquote><br /><br /><pre class="moz-signature" cols="90">-- ---------------------------------- Zoltán Böszörményi Cybertec Schönig & Schönig GmbH Gröhrmühlgasse 26 A-2700 Wiener Neustadt, Austria Web: <a class="moz-txt-link-freetext" href="http://www.postgresql-support.de">http://www.postgresql-support.de</a> <aclass="moz-txt-link-freetext" href="http://www.postgresql.at/">http://www.postgresql.at/</a> </pre>
Excerpts from Boszormenyi Zoltan's message of mar jun 26 12:43:34 -0400 2012: > So, should I keep the enum TimeoutName? Are global variables for > keeping dynamically assigned values preferred over the enum? > Currently we have 5 timeout sources in total, 3 of them are used by > regular backends, the remaining 2 are used by replication standby. > We can have a fixed size array (say with 8 or 16 elements) for future use > and this would be plenty. > > Opinions? My opinion is that the fixed size array is fine. I'll go set the patch "waiting on author". Also, remember to review some other people's patches. -- Álvaro Herrera <alvherre@commandprompt.com> The PostgreSQL Company - Command Prompt, Inc. PostgreSQL Replication, Consulting, Custom Development, 24x7 support
2012-06-26 18:49 keltezéssel, Alvaro Herrera írta: > Excerpts from Boszormenyi Zoltan's message of mar jun 26 12:43:34 -0400 2012: > >> So, should I keep the enum TimeoutName? Are global variables for >> keeping dynamically assigned values preferred over the enum? >> Currently we have 5 timeout sources in total, 3 of them are used by >> regular backends, the remaining 2 are used by replication standby. >> We can have a fixed size array (say with 8 or 16 elements) for future use >> and this would be plenty. >> >> Opinions? > My opinion is that the fixed size array is fine. Attached is the version which uses a registration interface. Also, to further minimize knowledge of timeouts in timeout.c, all GUCs are moved back to proc.c > I'll go set the patch "waiting on author". Also, remember to review > some other people's patches. I will look into it. Best regards, Zoltán Böszörményi -- ---------------------------------- Zoltán Böszörményi Cybertec Schönig & Schönig GmbH Gröhrmühlgasse 26 A-2700 Wiener Neustadt, Austria Web: http://www.postgresql-support.de http://www.postgresql.at/
Attachment
2012-06-27 10:34 keltezéssel, Boszormenyi Zoltan írta: > 2012-06-26 18:49 keltezéssel, Alvaro Herrera írta: >> Excerpts from Boszormenyi Zoltan's message of mar jun 26 12:43:34 -0400 2012: >> >>> So, should I keep the enum TimeoutName? Are global variables for >>> keeping dynamically assigned values preferred over the enum? >>> Currently we have 5 timeout sources in total, 3 of them are used by >>> regular backends, the remaining 2 are used by replication standby. >>> We can have a fixed size array (say with 8 or 16 elements) for future use >>> and this would be plenty. >>> >>> Opinions? >> My opinion is that the fixed size array is fine. > > Attached is the version which uses a registration interface. > > Also, to further minimize knowledge of timeouts in timeout.c, > all GUCs are moved back to proc.c > >> I'll go set the patch "waiting on author". Also, remember to review >> some other people's patches. > > I will look into it. > > Best regards, > Zoltán Böszörményi Does anyone have a little time to look at the latest timeout framework with the registration interface and the 2nd patch too? I am at work until Friday next week, after that I will be on vacation for two weeks. Just in case there is anything that needs tweaking to make it more acceptable. Thanks in advance, Zoltán Böszörményi -- ---------------------------------- Zoltán Böszörményi Cybertec Schönig & Schönig GmbH Gröhrmühlgasse 26 A-2700 Wiener Neustadt, Austria Web: http://www.postgresql-support.de http://www.postgresql.at/
Excerpts from Boszormenyi Zoltan's message of vie jun 29 14:30:28 -0400 2012: > Does anyone have a little time to look at the latest timeout framework > with the registration interface and the 2nd patch too? I am at work > until Friday next week, after that I will be on vacation for two weeks. > Just in case there is anything that needs tweaking to make it more > acceptable. I cleaned up this a bit more and now I think it's ready to commit -- as soon as somebody tests that the standby bits still work. I still have not looked at the second patch. -- Álvaro Herrera <alvherre@commandprompt.com> The PostgreSQL Company - Command Prompt, Inc. PostgreSQL Replication, Consulting, Custom Development, 24x7 support
Attachment
I don't understand why PGSemaphoreTimedLock() is not broken. I mean surely you need a bool return to let the caller know whether the acquisition succeeded or failed? AFAICS you are relying on get_timeout_indicator() but this seems to me the wrong thing to do ... (not to mention how ugly it is to percolate through two levels of abstraction) -- Álvaro Herrera <alvherre@commandprompt.com> The PostgreSQL Company - Command Prompt, Inc. PostgreSQL Replication, Consulting, Custom Development, 24x7 support
2012-07-03 23:31 keltezéssel, Alvaro Herrera írta: > Excerpts from Boszormenyi Zoltan's message of vie jun 29 14:30:28 -0400 2012: > >> Does anyone have a little time to look at the latest timeout framework >> with the registration interface and the 2nd patch too? I am at work >> until Friday next week, after that I will be on vacation for two weeks. >> Just in case there is anything that needs tweaking to make it more >> acceptable. > I cleaned up this a bit more and now I think it's ready to commit -- > as soon as somebody tests that the standby bits still work. You just broke initdb with this cleanup. :-) ---8<------8<------8<------8<------8<------8<------8<--- $ cat src/test/regress/log/initdb.log Running in noclean mode. Mistakes will not be cleaned up. The files belonging to this database system will be owned by user "zozo". This user must also own the server process. The database cluster will be initialized with locales COLLATE: hu_HU.utf8 CTYPE: hu_HU.utf8 MESSAGES: C MONETARY:hu_HU.utf8 NUMERIC: hu_HU.utf8 TIME: hu_HU.utf8 The default database encoding has accordingly been set to "UTF8". The default text search configuration will be set to "hungarian". creating directory /home/zozo/lock-timeout/9.1/1/postgresql.14/src/test/regress/./tmp_check/data ... ok creating subdirectories ... ok selecting default max_connections ... 100 selecting default shared_buffers ... 32MB creating configuration files ... ok creating template1 database in /home/zozo/lock-timeout/9.1/1/postgresql.14/src/test/regress/./tmp_check/data/base/1 ... ok initializing pg_authid ... TRAP: FailedAssertion("!(base_timeouts_initialized)", File: "timeout.c", Line: 217) sh: line 1: 29872 Aborted (core dumped) "/home/zozo/lock-timeout/9.1/1/postgresql.14/src/test/regress/tmp_check/install/home/zozo/pgc92dev-locktimeout/bin/postgres" --single -F -O -c search_path=pg_catalog -c exit_on_error=true template1 > /dev/null child process exited with exit code 134 initdb: data directory "/home/zozo/lock-timeout/9.1/1/postgresql.14/src/test/regress/./tmp_check/data" not removed at user's request ---8<------8<------8<------8<------8<------8<------8<--- initdb starts postgres --single, that doesn't do BackendInitialize(), only PostgresMain(). So, you need InitializeTimeouts() before the RegisterTimeout() calls in PostgresMain and the elog(PANIC) must not be in InitializeTimeouts() if called twice. -- ---------------------------------- Zoltán Böszörményi Cybertec Schönig & Schönig GmbH Gröhrmühlgasse 26 A-2700 Wiener Neustadt, Austria Web: http://www.postgresql-support.de http://www.postgresql.at/
2012-07-04 12:09 keltezéssel, Boszormenyi Zoltan írta: > 2012-07-03 23:31 keltezéssel, Alvaro Herrera írta: >> Excerpts from Boszormenyi Zoltan's message of vie jun 29 14:30:28 -0400 2012: >> >>> Does anyone have a little time to look at the latest timeout framework >>> with the registration interface and the 2nd patch too? I am at work >>> until Friday next week, after that I will be on vacation for two weeks. >>> Just in case there is anything that needs tweaking to make it more >>> acceptable. >> I cleaned up this a bit more and now I think it's ready to commit -- >> as soon as somebody tests that the standby bits still work. > > You just broke initdb with this cleanup. :-) > > ---8<------8<------8<------8<------8<------8<------8<--- > $ cat src/test/regress/log/initdb.log > Running in noclean mode. Mistakes will not be cleaned up. > The files belonging to this database system will be owned by user "zozo". > This user must also own the server process. > > The database cluster will be initialized with locales > COLLATE: hu_HU.utf8 > CTYPE: hu_HU.utf8 > MESSAGES: C > MONETARY: hu_HU.utf8 > NUMERIC: hu_HU.utf8 > TIME: hu_HU.utf8 > The default database encoding has accordingly been set to "UTF8". > The default text search configuration will be set to "hungarian". > > creating directory > /home/zozo/lock-timeout/9.1/1/postgresql.14/src/test/regress/./tmp_check/data ... ok > creating subdirectories ... ok > selecting default max_connections ... 100 > selecting default shared_buffers ... 32MB > creating configuration files ... ok > creating template1 database in > /home/zozo/lock-timeout/9.1/1/postgresql.14/src/test/regress/./tmp_check/data/base/1 ... ok > initializing pg_authid ... TRAP: FailedAssertion("!(base_timeouts_initialized)", File: > "timeout.c", Line: 217) > sh: line 1: 29872 Aborted (core dumped) > "/home/zozo/lock-timeout/9.1/1/postgresql.14/src/test/regress/tmp_check/install/home/zozo/pgc92dev-locktimeout/bin/postgres" > --single -F -O -c search_path=pg_catalog -c exit_on_error=true template1 > /dev/null > child process exited with exit code 134 > initdb: data directory > "/home/zozo/lock-timeout/9.1/1/postgresql.14/src/test/regress/./tmp_check/data" not > removed at user's request > ---8<------8<------8<------8<------8<------8<------8<--- > > initdb starts postgres --single, that doesn't do BackendInitialize(), > only PostgresMain(). So, you need InitializeTimeouts() before > the RegisterTimeout() calls in PostgresMain and the elog(PANIC) > must not be in InitializeTimeouts() if called twice. > > Attached is the fix for this problem. PostgresMain() has a new argument: bool single_user. This way, InitializeTimeouts() can keep its elog(PANIC) if called twice and "postgres --single" doesn't fail its Assert() in RegisterTimeout(). Comments? Best regards, Zoltán Böszörményi -- ---------------------------------- Zoltán Böszörményi Cybertec Schönig & Schönig GmbH Gröhrmühlgasse 26 A-2700 Wiener Neustadt, Austria Web: http://www.postgresql-support.de http://www.postgresql.at/
Attachment
2012-07-03 23:38 keltezéssel, Alvaro Herrera írta: > > I don't understand why PGSemaphoreTimedLock() is not broken. I mean > surely you need a bool return to let the caller know whether the > acquisition succeeded or failed? Well, this is the same interface PGSemaphoreTryLock() uses. By this reasoning, it's also broken. > AFAICS you are relying on > get_timeout_indicator() but this seems to me the wrong thing to do ... > (not to mention how ugly it is to percolate through two levels of > abstraction) What other way do you suggest? EINTR may come from a different signal, which may also be ignored or not. Ctrl-C is handled and leads to elog(ERROR) but an ignored signal technically calls a NOP handler deep inside the OS runtime libraries but the signal *is* delivered to the backend which in turn interrupts semop() or whatever the platform equivalent is. I can add a flag to timeout.c that is set whenever SIGALRM is delivered but checking that would be another "abstraction violation" as calling get_timeout_indicator() in your opinion. The original coding of PGSemaphoreTryLock() used semtimedop(), sem_timedwait() and the timeout value applied to WaitForMultipleObjectsEx(). This was quickly shot down as using the SIGALRM signal and its behaviour to interrupt the locking operation is be better and fits the PostgreSQL portability features. Also, OS X at the time didn't support sem_timedwait(). I am not complaining, just recalling the different details. How about not hardcoding get_timeout_indicator(LOCK_TIMEOUT) into PGSemaphoreTimedLock()? Passing TimeoutName to it would make it more generic and usable for other timeout sources. -- ---------------------------------- Zoltán Böszörményi Cybertec Schönig & Schönig GmbH Gröhrmühlgasse 26 A-2700 Wiener Neustadt, Austria Web: http://www.postgresql-support.de http://www.postgresql.at/
Excerpts from Boszormenyi Zoltan's message of mié jul 04 06:32:46 -0400 2012: > 2012-07-04 12:09 keltezéssel, Boszormenyi Zoltan írta: > > You just broke initdb with this cleanup. :-) Ouch. > > initdb starts postgres --single, that doesn't do BackendInitialize(), > > only PostgresMain(). So, you need InitializeTimeouts() before > > the RegisterTimeout() calls in PostgresMain and the elog(PANIC) > > must not be in InitializeTimeouts() if called twice. > > Attached is the fix for this problem. PostgresMain() has a new > argument: bool single_user. This way, InitializeTimeouts() can > keep its elog(PANIC) if called twice and "postgres --single" > doesn't fail its Assert() in RegisterTimeout(). Hmm. Maybe it's better to leave InitializeTimeouts to be called twice after all. The fix seems a lot uglier than the disease it's curing. -- Álvaro Herrera <alvherre@commandprompt.com> The PostgreSQL Company - Command Prompt, Inc. PostgreSQL Replication, Consulting, Custom Development, 24x7 support
Excerpts from Boszormenyi Zoltan's message of mié jul 04 07:03:44 -0400 2012: > > 2012-07-03 23:38 keltezéssel, Alvaro Herrera írta: > > > > I don't understand why PGSemaphoreTimedLock() is not broken. I mean > > surely you need a bool return to let the caller know whether the > > acquisition succeeded or failed? > > Well, this is the same interface PGSemaphoreTryLock() uses. > By this reasoning, it's also broken. Eh, no -- as far as I can see, PGSemaphoreTryLock returns a boolean, which is precisely my point. -- Álvaro Herrera <alvherre@commandprompt.com> The PostgreSQL Company - Command Prompt, Inc. PostgreSQL Replication, Consulting, Custom Development, 24x7 support
2012-07-04 17:25 keltezéssel, Alvaro Herrera írta: > Excerpts from Boszormenyi Zoltan's message of mié jul 04 07:03:44 -0400 2012: >> 2012-07-03 23:38 keltezéssel, Alvaro Herrera írta: >>> I don't understand why PGSemaphoreTimedLock() is not broken. I mean >>> surely you need a bool return to let the caller know whether the >>> acquisition succeeded or failed? >> Well, this is the same interface PGSemaphoreTryLock() uses. >> By this reasoning, it's also broken. > Eh, no -- as far as I can see, PGSemaphoreTryLock returns a boolean, > which is precisely my point. You're right. I blame the heat for not being able to properly read my own code. -- ---------------------------------- Zoltán Böszörményi Cybertec Schönig & Schönig GmbH Gröhrmühlgasse 26 A-2700 Wiener Neustadt, Austria Web: http://www.postgresql-support.de http://www.postgresql.at/
2012-07-04 17:20 keltezéssel, Alvaro Herrera írta: > Excerpts from Boszormenyi Zoltan's message of mié jul 04 06:32:46 -0400 2012: >> 2012-07-04 12:09 keltezéssel, Boszormenyi Zoltan írta: >>> You just broke initdb with this cleanup. :-) > Ouch. > >>> initdb starts postgres --single, that doesn't do BackendInitialize(), >>> only PostgresMain(). So, you need InitializeTimeouts() before >>> the RegisterTimeout() calls in PostgresMain and the elog(PANIC) >>> must not be in InitializeTimeouts() if called twice. >> Attached is the fix for this problem. PostgresMain() has a new >> argument: bool single_user. This way, InitializeTimeouts() can >> keep its elog(PANIC) if called twice and "postgres --single" >> doesn't fail its Assert() in RegisterTimeout(). > Hmm. Maybe it's better to leave InitializeTimeouts to be called twice > after all. The fix seems a lot uglier than the disease it's curing. Attached are the refreshed patches. InitializeTimeouts() can be called twice and PGSemaphoreTimedLock() returns bool now. This saves two calls to get_timeout_indicator(). -- ---------------------------------- Zoltán Böszörményi Cybertec Schönig & Schönig GmbH Gröhrmühlgasse 26 A-2700 Wiener Neustadt, Austria Web: http://www.postgresql-support.de http://www.postgresql.at/
Attachment
Boszormenyi Zoltan <zb@cybertec.at> writes: > Attached are the refreshed patches. InitializeTimeouts() can be called > twice and PGSemaphoreTimedLock() returns bool now. This saves > two calls to get_timeout_indicator(). I'm starting to look at this patch now. There are a number of cosmetic things I don't care for, the biggest one being the placement of timeout.c under storage/lmgr/. That seems an entirely random place, since the functionality provided has got nothing to do with storage let alone locks. I'm inclined to think that utils/misc/ is about the best option in the existing backend directory hierarchy. Anybody object to that, or have a better idea? Another thing that needs some discussion is the handling of InitializeTimeouts. As designed, I think it's completely unsafe, the reason being that if a process using timeouts forks off another one, the child will inherit the parent's timeout reasons and be unable to reset them. Right now this might not be such a big problem because the postmaster doesn't need any timeouts, but what if it does in the future? So I think we should drop the base_timeouts_initialized "protection", and that means we need a pretty consistent scheme for where to call InitializeTimeouts. But we already have the same issue with respect to on_proc_exit callbacks, so we can just add InitializeTimeouts calls in the same places as on_exit_reset(). Comments? I'll work up a revised patch and post it. regards, tom lane
Excerpts from Tom Lane's message of mié jul 11 15:47:47 -0400 2012: > > Boszormenyi Zoltan <zb@cybertec.at> writes: > > Attached are the refreshed patches. InitializeTimeouts() can be called > > twice and PGSemaphoreTimedLock() returns bool now. This saves > > two calls to get_timeout_indicator(). > > I'm starting to look at this patch now. There are a number of cosmetic > things I don't care for, the biggest one being the placement of > timeout.c under storage/lmgr/. That seems an entirely random place, > since the functionality provided has got nothing to do with storage > let alone locks. I'm inclined to think that utils/misc/ is about > the best option in the existing backend directory hierarchy. Anybody > object to that, or have a better idea? I agree with the proposed new location. > Another thing that needs some discussion is the handling of > InitializeTimeouts. As designed, I think it's completely unsafe, > the reason being that if a process using timeouts forks off another > one, the child will inherit the parent's timeout reasons and be unable > to reset them. Right now this might not be such a big problem because > the postmaster doesn't need any timeouts, but what if it does in the > future? So I think we should drop the base_timeouts_initialized > "protection", and that means we need a pretty consistent scheme for > where to call InitializeTimeouts. But we already have the same issue > with respect to on_proc_exit callbacks, so we can just add > InitializeTimeouts calls in the same places as on_exit_reset(). I do agree that InitializeTimeouts is not optimally placed. We discussed this upthread. Some of the calls of on_exit_reset() are placed in code that's about to die. Surely we don't need InitializeTimeouts() then. Maybe we should have another routine, say InitializeProcess (noting we already InitProcess so maybe some name would be good), that calls both on_exit_reset and InitializeTimeouts. -- Álvaro Herrera <alvherre@commandprompt.com> The PostgreSQL Company - Command Prompt, Inc. PostgreSQL Replication, Consulting, Custom Development, 24x7 support
I wrote: > I'm starting to look at this patch now. After reading this further, I think that the "sched_next" option is a bad idea and we should get rid of it. AFAICT, what it is meant to do is (if !sched_next) automatically do "disable_all_timeouts(true)" if the particular timeout happens to fire. But there is no reason the timeout's callback function couldn't do that; and doing it in the callback is more flexible since you could have logic about whether to do it or not, rather than freezing the decision at RegisterTimeout time. Moreover, it does not seem to me to be a particularly good idea to encourage timeouts to have such behavior, anyway. Each time we add another timeout we'd have to look to see if it's still sane for each existing timeout to use !sched_next. It would likely be better, in most cases, for individual callbacks to explicitly disable any other individual timeout reasons that should no longer be fired. I am also underwhelmed by the "timeout_start" callback function concept. In the first place, that's broken enable_timeout, which incorrectly assumes that the value it gets must be "now" (see its schedule_alarm call). In the second place, it seems fairly likely that callers of get_timeout_start would likewise want the clock time at which the timeout was enabled, not the timeout_start reference time. (If they did want the latter, why couldn't they get it from wherever the callback function had gotten it?) I'm inclined to propose that we drop the timeout_start concept and instead provide two functions for scheduling interrupts: enable_timeout_after(TimeoutName tn, int delay_ms);enable_timeout_at(TimeoutName tn, TimestampTz fin_time); where you use the former if you want the standard GetCurrentTimestamp + n msec calculation, but if you want the stop time calculated in some other way, you calculate it yourself and use the second function. regards, tom lane
Alvaro Herrera <alvherre@commandprompt.com> writes: > Excerpts from Tom Lane's message of mié jul 11 15:47:47 -0400 2012: >> ... that means we need a pretty consistent scheme for >> where to call InitializeTimeouts. But we already have the same issue >> with respect to on_proc_exit callbacks, so we can just add >> InitializeTimeouts calls in the same places as on_exit_reset(). > I do agree that InitializeTimeouts is not optimally placed. We > discussed this upthread. > Some of the calls of on_exit_reset() are placed in code that's about to > die. Surely we don't need InitializeTimeouts() then. Maybe we should > have another routine, say InitializeProcess (noting we already > InitProcess so maybe some name would be good), that calls both > on_exit_reset and InitializeTimeouts. Yeah, I was wondering about that too, but it seems a bit ad-hoc from a modularity standpoint. I gave some consideration to the idea of putting these calls directly into fork_process(), but we'd have to be very sure that there would never be a case where it was incorrect to do them after forking. regards, tom lane
Here is a revised version of the timeout-infrastructure patch. I whacked it around quite a bit, notably: * I decided that the most convenient way to handle the initialization issue was to combine establishment of the signal handler with resetting of the per-process variables. So handle_sig_alarm is no longer global, and InitializeTimeouts is called at the places where we used to do "pqsignal(SIGALRM, handle_sig_alarm);". I believe this is correct because any subprocess that was intending to use SIGALRM must have called that before establishing any timeouts. * BTW, doing that exposed the fact that walsender processes were failing to establish a SIGALRM signal handler, which is a pre-existing bug, because they run a normal authentication transaction during InitPostgres and hence need to be able to cope with deadlock and statement timeouts. I will do something about back-patching a fix for that. * I ended up putting the RegisterTimeout calls for DEADLOCK_TIMEOUT and STATEMENT_TIMEOUT into InitPostgres, ensuring that they'd get done in walsender and autovacuum processes. I'm not totally satisfied with that, but for sure it didn't work to only establish them in regular backends. * I didn't like the "TimeoutName" nomenclature, because to me "name" suggests that the value is a string, not just an enum. So I renamed that to TimeoutId. * I whacked around the logic in timeout.c a fair amount, because it had race conditions if SIGALRM happened while enabling or disabling a timeout. I believe the attached coding is safe, but I'm not totally happy with it from a performance standpoint, because it will do two setitimer calls (a disable and then a re-enable) in many cases where the old code did only one. I think what we ought to do is go ahead and apply this, so that we can have the API nailed down, and then we can revisit the internals of timeout.c to see if we can't get the performance back up. It's clearly a much cleaner design than the old spaghetti logic in proc.c, so I think we ought to go ahead with this independently of whether the second patch gets accepted. I haven't really looked at the second patch yet, but at minimum that will need some rebasing to match the API tweaks here. regards, tom lane
Attachment
2012-07-11 21:47 keltezéssel, Tom Lane írta: > Boszormenyi Zoltan <zb@cybertec.at> writes: >> Attached are the refreshed patches. InitializeTimeouts() can be called >> twice and PGSemaphoreTimedLock() returns bool now. This saves >> two calls to get_timeout_indicator(). > I'm starting to look at this patch now. There are a number of cosmetic > things I don't care for, the biggest one being the placement of > timeout.c under storage/lmgr/. That seems an entirely random place, > since the functionality provided has got nothing to do with storage > let alone locks. I'm inclined to think that utils/misc/ is about > the best option in the existing backend directory hierarchy. Anybody > object to that, or have a better idea? Good idea, storage/lmgr/timeout.c was chosen simply because it was born out of files living there. > Another thing that needs some discussion is the handling of > InitializeTimeouts. As designed, I think it's completely unsafe, > the reason being that if a process using timeouts forks off another > one, the child will inherit the parent's timeout reasons and be unable > to reset them. Right now this might not be such a big problem because > the postmaster doesn't need any timeouts, but what if it does in the > future? So I think we should drop the base_timeouts_initialized > "protection", and that means we need a pretty consistent scheme for > where to call InitializeTimeouts. But we already have the same issue > with respect to on_proc_exit callbacks, so we can just add > InitializeTimeouts calls in the same places as on_exit_reset(). > > Comments? > > I'll work up a revised patch and post it. > > regards, tom lane > -- ---------------------------------- Zoltán Böszörményi Cybertec Schönig & Schönig GmbH Gröhrmühlgasse 26 A-2700 Wiener Neustadt, Austria Web: http://www.postgresql-support.de http://www.postgresql.at/
2012-07-11 22:59 keltezéssel, Tom Lane írta: > I wrote: >> I'm starting to look at this patch now. > After reading this further, I think that the "sched_next" option is a > bad idea and we should get rid of it. AFAICT, what it is meant to do > is (if !sched_next) automatically do "disable_all_timeouts(true)" if > the particular timeout happens to fire. But there is no reason the > timeout's callback function couldn't do that; and doing it in the > callback is more flexible since you could have logic about whether to do > it or not, rather than freezing the decision at RegisterTimeout time. > Moreover, it does not seem to me to be a particularly good idea to > encourage timeouts to have such behavior, anyway. Each time we add > another timeout we'd have to look to see if it's still sane for each > existing timeout to use !sched_next. It would likely be better, in > most cases, for individual callbacks to explicitly disable any other > individual timeout reasons that should no longer be fired. +1 > I am also underwhelmed by the "timeout_start" callback function concept. It was generalized out of "static TimestampTz timeout_start_time;" in proc.c which is valid if deadlock_timeout is activated. It is used in ProcSleep() for logging the time difference between the time when the timeout was activated and "now" at several places in that function. > In the first place, that's broken enable_timeout, which incorrectly > assumes that the value it gets must be "now" (see its schedule_alarm > call). You're right, it must be fixed. > In the second place, it seems fairly likely that callers of > get_timeout_start would likewise want the clock time at which the > timeout was enabled, not the timeout_start reference time. (If they > did want the latter, why couldn't they get it from wherever the callback > function had gotten it?) I'm inclined to propose that we drop the > timeout_start concept and instead provide two functions for scheduling > interrupts: > > enable_timeout_after(TimeoutName tn, int delay_ms); > enable_timeout_at(TimeoutName tn, TimestampTz fin_time); > > where you use the former if you want the standard GetCurrentTimestamp + > n msec calculation, but if you want the stop time calculated in some > other way, you calculate it yourself and use the second function. > > regards, tom lane > -- ---------------------------------- Zoltán Böszörményi Cybertec Schönig & Schönig GmbH Gröhrmühlgasse 26 A-2700 Wiener Neustadt, Austria Web: http://www.postgresql-support.de http://www.postgresql.at/
2012-07-12 19:05 keltezéssel, Tom Lane írta: > Here is a revised version of the timeout-infrastructure patch. > I whacked it around quite a bit, notably: > > * I decided that the most convenient way to handle the initialization > issue was to combine establishment of the signal handler with resetting > of the per-process variables. So handle_sig_alarm is no longer global, > and InitializeTimeouts is called at the places where we used to do > "pqsignal(SIGALRM, handle_sig_alarm);". I believe this is correct > because any subprocess that was intending to use SIGALRM must have > called that before establishing any timeouts. OK. > * BTW, doing that exposed the fact that walsender processes were failing > to establish a SIGALRM signal handler, which is a pre-existing bug, > because they run a normal authentication transaction during InitPostgres > and hence need to be able to cope with deadlock and statement timeouts. > I will do something about back-patching a fix for that. Wow, my work uncovered a pre-existing bug in PostgreSQL. :-) > * I ended up putting the RegisterTimeout calls for DEADLOCK_TIMEOUT > and STATEMENT_TIMEOUT into InitPostgres, ensuring that they'd get > done in walsender and autovacuum processes. I'm not totally satisfied > with that, but for sure it didn't work to only establish them in > regular backends. > > * I didn't like the "TimeoutName" nomenclature, because to me "name" > suggests that the value is a string, not just an enum. So I renamed > that to TimeoutId. OK. > * I whacked around the logic in timeout.c a fair amount, because it > had race conditions if SIGALRM happened while enabling or disabling > a timeout. I believe the attached coding is safe, but I'm not totally > happy with it from a performance standpoint, because it will do two > setitimer calls (a disable and then a re-enable) in many cases where > the old code did only one. Disabling deadlock timeout, a.k.a. disable_sig_alarm(false) in the original code called setitimer() twice if statement timeout was still in effect, it was done in CheckStatementTimeout(). Considering this, I think there is no performance problem with the new code you came up with. > I think what we ought to do is go ahead and apply this, so that we > can have the API nailed down, and then we can revisit the internals > of timeout.c to see if we can't get the performance back up. > It's clearly a much cleaner design than the old spaghetti logic in > proc.c, so I think we ought to go ahead with this independently of > whether the second patch gets accepted. There is one tiny bit you might have broken. You wrote previously: > I am also underwhelmed by the "timeout_start" callback function concept. > In the first place, that's broken enable_timeout, which incorrectly > assumes that the value it gets must be "now" (see its schedule_alarm > call). In the second place, it seems fairly likely that callers of > get_timeout_start would likewise want the clock time at which the > timeout was enabled, not the timeout_start reference time. (If they > did want the latter, why couldn't they get it from wherever the callback > function had gotten it?) I'm inclined to propose that we drop the > timeout_start concept and instead provide two functions for scheduling > interrupts: > > enable_timeout_after(TimeoutName tn, int delay_ms); > enable_timeout_at(TimeoutName tn, TimestampTz fin_time); > > where you use the former if you want the standard GetCurrentTimestamp + > n msec calculation, but if you want the stop time calculated in some > other way, you calculate it yourself and use the second function. It's okay, but you haven't really followed it with STATEMENT_TIMEOUT: -----8<----------8<----------8<----------8<----------8<----- *** 2396,2404 **** /* Set statement timeout running, if any */ /* NB: this mustn't be enableduntil we are within an xact */ if (StatementTimeout > 0) ! enable_sig_alarm(StatementTimeout, true); else ! cancel_from_timeout = false; xact_started = true; } --- 2397,2405 ---- /* Set statement timeout running, if any */ /* NB: this mustn't be enableduntil we are within an xact */ if (StatementTimeout > 0) ! enable_timeout_after(STATEMENT_TIMEOUT, StatementTimeout); else ! disable_timeout(STATEMENT_TIMEOUT, false); xact_started = true; } -----8<----------8<----------8<----------8<----------8<----- It means that StatementTimeout losts its precision. It would trigger in the future counting from "now" instead of counting from GetCurrentStatementStartTimestamp(). It should be: enable_timeout_at(STATEMENT_TIMEOUT, TimestampTzPlusMilliseconds(GetCurrentStatementStartTimestamp(), StatementTimeout)); > I haven't really looked at the second patch yet, but at minimum that > will need some rebasing to match the API tweaks here. Yes, I will do that. Thanks for your review and work. Best regards, Zoltán Böszörményi -- ---------------------------------- Zoltán Böszörményi Cybertec Schönig & Schönig GmbH Gröhrmühlgasse 26 A-2700 Wiener Neustadt, Austria Web: http://www.postgresql-support.de http://www.postgresql.at/
2012-07-13 22:32 keltezéssel, Boszormenyi Zoltan írta: > 2012-07-12 19:05 keltezéssel, Tom Lane írta: > >> I haven't really looked at the second patch yet, but at minimum that >> will need some rebasing to match the API tweaks here. > > Yes, I will do that. While doing it, I discovered another bug you introduced. enable_timeout_after(..., 0); would set an alarm instead of ignoring it. Try SET deadlock_timeout = 0; Same for enable_timeout_at(..., fin_time): if fin_time points to the past, it enables a huge timeout that wouldn't possibly trigger for short transactions but it's a bug nevertheless. > > Thanks for your review and work. > > Best regards, > Zoltán Böszörményi > -- ---------------------------------- Zoltán Böszörményi Cybertec Schönig & Schönig GmbH Gröhrmühlgasse 26 A-2700 Wiener Neustadt, Austria Web: http://www.postgresql-support.de http://www.postgresql.at/
Boszormenyi Zoltan <zb@cybertec.at> writes: > It means that StatementTimeout losts its precision. It would trigger > in the future counting from "now" instead of counting from > GetCurrentStatementStartTimestamp(). That is, in fact, better not worse. Note the comment in the existing code: * Begin statement-level timeout * * Note that we compute statement_fin_time with reference to the * statement_timestamp, but apply the specified delay without any * correction; that is, we ignore whatever timehas elapsed since * statement_timestamp was set. In the normal case only a small * interval will haveelapsed and so this doesn't matter, but there * are corner cases (involving multi-statement query strings with * embedded COMMIT or ROLLBACK) where we might re-initialize the * statement timeout long after initialreceipt of the message. In * such cases the enforcement of the statement timeout will be a bit * inconsistent. This annoyance is judged not worth the cost of * performing an additional gettimeofday() here. That is, measuring from GetCurrentStatementStartTimestamp is a hack to save one gettimeofday call, at the cost of making the timeout less accurate, sometimes significantly so. In the new model there isn't any good way to duplicate this kluge (in particular, there's no point in using enable_timeout_at, because that will still make a gettimeofday call). That doesn't bother me too much. I'd rather try to buy back whatever performance was lost by seeing if we can reduce the number of setitimer calls. regards, tom lane
Boszormenyi Zoltan <zb@cybertec.at> writes: > While doing it, I discovered another bug you introduced. > enable_timeout_after(..., 0); would set an alarm instead of ignoring it. > Try SET deadlock_timeout = 0; Hm. I don't think it's a bug for enable_timeout_after(..., 0) to cause a timeout ... but we'll have to change the calling code. Thanks for catching that. > Same for enable_timeout_at(..., fin_time): if fin_time points to the past, > it enables a huge timeout No, it should cause an immediate interrupt, or at least after 1 microsecond. Look at TimestampDifference. regards, tom lane
2012-07-13 23:51 keltezéssel, Tom Lane írta: > Boszormenyi Zoltan <zb@cybertec.at> writes: >> While doing it, I discovered another bug you introduced. >> enable_timeout_after(..., 0); would set an alarm instead of ignoring it. >> Try SET deadlock_timeout = 0; > Hm. I don't think it's a bug for enable_timeout_after(..., 0) to cause > a timeout ... but we'll have to change the calling code. Thanks for > catching that. You're welcome. This caused a segfault in my second patch, the code didn't expect enable_timeout_after(..., 0); to set up a timer. So, the calling code should check for the value and not call enable_timeout_*() when it shouldn't, right? It's making the code more obvious for the casual reader, I agree it's better that way. Will you post a new version with callers checking their *Timeout settings or commit it with this change? I can then post a new second patch. Regarding the lock_timeout functionality: the patch can be reduced to about half of its current size and it would be a lot less intrusive if the LockAcquire() callers don't need to report the individual object types and names or OIDs. Do you prefer the verbose ereport()s or a generic one about "lock timeout triggered" in ProcSleep()? >> Same for enable_timeout_at(..., fin_time): if fin_time points to the past, >> it enables a huge timeout > No, it should cause an immediate interrupt, or at least after 1 > microsecond. Look at TimestampDifference. Okay. -- ---------------------------------- Zoltán Böszörményi Cybertec Schönig & Schönig GmbH Gröhrmühlgasse 26 A-2700 Wiener Neustadt, Austria Web: http://www.postgresql-support.de http://www.postgresql.at/
Excerpts from Boszormenyi Zoltan's message of vie jul 13 18:11:27 -0400 2012: > Regarding the lock_timeout functionality: the patch can be reduced to > about half of its current size and it would be a lot less intrusive if the > LockAcquire() callers don't need to report the individual object types > and names or OIDs. Do you prefer the verbose ereport()s or a > generic one about "lock timeout triggered" in ProcSleep()? For what it's worth, I would appreciate it if you would post the lock timeout patch for the upcoming commitfest. This one is already almost a month long now. That way we can close this CF item soon and concentrate on the remaining patches. This one has received its fair share of committer attention already, ISTM. -- Álvaro Herrera <alvherre@commandprompt.com> The PostgreSQL Company - Command Prompt, Inc. PostgreSQL Replication, Consulting, Custom Development, 24x7 support
Boszormenyi Zoltan <zb@cybertec.at> writes: >>> Try SET deadlock_timeout = 0; Actually, when I try that I get ERROR: 0 is outside the valid range for parameter "deadlock_timeout" (1 .. 2147483647) So I don't see any bug here. The places that are unconditionally doing "enable_timeout_after(..., DeadlockTimeout);" are perfectly fine. The only place that might need an if-test has already got one: if (StatementTimeout > 0) ! enable_timeout_after(STATEMENT_TIMEOUT, StatementTimeout); else ! disable_timeout(STATEMENT_TIMEOUT, false); regards, tom lane
Alvaro Herrera <alvherre@commandprompt.com> writes: > For what it's worth, I would appreciate it if you would post the lock > timeout patch for the upcoming commitfest. +1. I think it's reasonable to get the infrastructure patch in now, but we are running out of time in this commitfest (and there's still a lot of patches that haven't been reviewed at all). regards, tom lane
2012-07-14 00:36 keltezéssel, Tom Lane írta: > Alvaro Herrera <alvherre@commandprompt.com> writes: >> For what it's worth, I would appreciate it if you would post the lock >> timeout patch for the upcoming commitfest. > +1. I think it's reasonable to get the infrastructure patch in now, > but we are running out of time in this commitfest (and there's still > a lot of patches that haven't been reviewed at all). > > regards, tom lane > OK, I will do that. Thanks for the review. -- ---------------------------------- Zoltán Böszörményi Cybertec Schönig & Schönig GmbH Gröhrmühlgasse 26 A-2700 Wiener Neustadt, Austria Web: http://www.postgresql-support.de http://www.postgresql.at/
Excerpts from Tom Lane's message of vie jul 13 18:23:31 -0400 2012: > > Boszormenyi Zoltan <zb@cybertec.at> writes: > >>> Try SET deadlock_timeout = 0; > > Actually, when I try that I get > > ERROR: 0 is outside the valid range for parameter "deadlock_timeout" (1 .. 2147483647) > > So I don't see any bug here. I committed this patch without changing this. If this needs patched, please speak up. I also considered adding a comment on enable_timeout_after about calling it with a delay of 0, but refrained; if anybody thinks it's necessary, suggestions are welcome. -- Álvaro Herrera <alvherre@commandprompt.com> The PostgreSQL Company - Command Prompt, Inc. PostgreSQL Replication, Consulting, Custom Development, 24x7 support
2012-07-17 06:32 keltezéssel, Alvaro Herrera írta: > Excerpts from Tom Lane's message of vie jul 13 18:23:31 -0400 2012: >> Boszormenyi Zoltan <zb@cybertec.at> writes: >>>>> Try SET deadlock_timeout = 0; >> Actually, when I try that I get >> >> ERROR: 0 is outside the valid range for parameter "deadlock_timeout" (1 .. 2147483647) >> >> So I don't see any bug here. > I committed this patch without changing this. If this needs patched, > please speak up. I also considered adding a comment on > enable_timeout_after about calling it with a delay of 0, but refrained; > if anybody thinks it's necessary, suggestions are welcome. Thanks for committing this part. Attached is the revised (and a lot leaner, more generic) lock timeout patch, which introduces new functionality for the timeout registration framework. The new functionality is called "extra timeouts", better naming is welcome. Instead of only the previously defined (deadlock and statement) timeouts, the "extra" timeouts can also be activated from within ProcSleep() in a linked way. The lock timeout is a special case of this functionality. To show this, this patch is split into two again to make reviewing easier. This way, the timeout framework is really usable for external modules, as envisioned by you guys Also, rewriting statement and deadlock timeouts using this functionality and unifying the two registration interfaces may be possible later. But messing up proven and working code is not in the scope of this patch or my job at this point. Best regards, Zoltán Böszörményi -- ---------------------------------- Zoltán Böszörményi Cybertec Schönig & Schönig GmbH Gröhrmühlgasse 26 A-2700 Wiener Neustadt, Austria Web: http://www.postgresql-support.de http://www.postgresql.at/
Attachment
Hi, new version with a lot more cleanup is attached. 2012-07-22 22:03 keltezéssel, Boszormenyi Zoltan írta: > Attached is the revised (and a lot leaner, more generic) lock timeout patch, > which introduces new functionality for the timeout registration framework. > The new functionality is called "extra timeouts", better naming is welcome. > Instead of only the previously defined (deadlock and statement) timeouts, > the "extra" timeouts can also be activated from within ProcSleep() in a linked > way. This "mini-framework" is now called "lock manager timeouts" and both deadlock timeout and the new lock timeout belong to it. The little piece of standalone code managing these are in storage/lmgr/lmgrtimeout.c. There is no PGSemaphoreTimedLock() any more. Instead, PGSemaphoreLock() gained a new function argument for checking timeouts. This has three advantages: - There is only one PGSemaphoreLock() implementation and bug fixes like ada8fa08fc6cf5f199b6df935b4d0a730aaa4fec don't need to touch several places. - There is no layering violation between pg_sema.c and proc.c. - The extra function can check other type of conditions from different callers, should the need arise. Best regards, Zoltán Böszörményi -- ---------------------------------- Zoltán Böszörményi Cybertec Schönig & Schönig GmbH Gröhrmühlgasse 26 A-2700 Wiener Neustadt, Austria Web: http://www.postgresql-support.de http://www.postgresql.at/
Attachment
Boszormenyi Zoltan <zb@cybertec.at> writes: > new version with a lot more cleanup is attached. I looked at this patch, and frankly I'm rather dismayed. It's a mess. To start at the bottom level, the changes to PGSemaphoreLock broke it, and seem probably unnecessary anyway. As coded, calling the "condition checker" breaks handling of error returns from semop(), unless the checker is careful to preserve errno, which LmgrTimeoutCondition isn't (and really shouldn't need to be anyway). More, if the checker does return true, it causes PGSemaphoreLock to utterly violate its contract: it returns to the caller without having acquired the semaphore, and without even telling the caller so. Worse, if we *did* acquire the semaphore, we might still exit via this path, since the placement of the condition check call ignores the comment a few lines up: * Once we acquire the lock, we do NOT check for an interrupt before * returning. The caller needs to be able to recordownership of the lock * before any interrupt can be accepted. We could possibly fix all this with a redesigned API contract for PGSemaphoreLock, but frankly I do not see a good reason to be tinkering with it at all. We never needed to get it involved with deadlock check handling, and I don't see why that needs to change for lock timeouts. One very good reason why monkeying with PGSemaphoreLock is wrong is that on some platforms a SIGALRM interrupt won't interrupt the semop() call, and thus control would never reach the "checker" anyway. If we're going to throw an error, it must be thrown from the interrupt handler. The whole lmgrtimeout module seems to me to be far more mechanism than is warranted, for too little added functionality. In the first place, there is nothing on the horizon suggesting that we need to let any plug-in code get control here, and if anything the delicacy of what's going on leads me to not wish to expose such a possibility. In the second place, it isn't adding any actually useful functionality, it's just agglomerating some checks. The minimum thing I would want it to do is avoid calling timeout.c multiple times, which is what would happen right now (leading to four extra syscalls per lock acquisition, which is enough new overhead to constitute a strong objection to committing this patch at all). On the whole I think we could forget lmgrtimeout and just hardwire the lock timeout and deadlock check cases. But in any case we're going to need support in timeout.c for enabling/disabling multiple timeouts at once without extra setitimer calls. I'm also not thrilled about the way in which the existing deadlock checking code has been hacked up. As an example, you added this to DeadLockReport(): + if (!DeadLockTimeoutCondition()) + return; which again causes it to violate its contract, namely to report a deadlock, in the most fundamental way -- existing callers aren't expecting it to return *at all*. Surely we can decouple the deadlock and lock timeout cases better than that; or at least if we can't it's a delusion to propose anything like lmgrtimeout in the first place. There's considerable lack of attention to updating comments, too. For instance in WaitOnLock you only bothered to update the comment immediately adjacent to the changed code, and not the two comment blocks above that, which both have specific references to deadlocks being the reason for failure. Also, the "per statement" mode for lock timeout doesn't seem to be any such thing, because it's implemented like this: + case LOCK_TIMEOUT_PER_STMT: + enable_timeout_at(LOCK_TIMEOUT, + TimestampTzPlusMilliseconds( + GetCurrentStatementStartTimestamp(), + LockTimeout)); + break; That doesn't provide anything like "you can spend at most N milliseconds waiting for locks during a statement". What it is is "if you happen to be waiting for a lock N milliseconds after the statement starts, or if you attempt to acquire any lock more than N milliseconds after the statement starts, you lose instantly". I don't think that definition actually adds any useful functionality compared to setting statement_timeout to N milliseconds, and it's certainly wrongly documented. To do what the documentation implies would require tracking and adding up the time spent waiting for locks during a statement. Which might be a good thing to do, especially if the required gettimeofday() calls could be shared with what timeout.c probably has to do anyway at start and stop of a lock wait. But this code doesn't do it. Lastly, I'm not sure where is the best place to be adding the control logic for this, but I'm pretty sure postinit.c is not it. It oughta be somewhere under storage/lmgr/, no? regards, tom lane
2012-09-22 20:49 keltezéssel, Tom Lane írta: > Boszormenyi Zoltan <zb@cybertec.at> writes: >> new version with a lot more cleanup is attached. > I looked at this patch, and frankly I'm rather dismayed. It's a mess. Thank you for the kind words. :-) > To start at the bottom level, the changes to PGSemaphoreLock broke it, > and seem probably unnecessary anyway. As coded, calling the "condition > checker" breaks handling of error returns from semop(), unless the checker > is careful to preserve errno, which LmgrTimeoutCondition isn't (and really > shouldn't need to be anyway). I would call it an oversight, nothing more. > More, if the checker does return true, > it causes PGSemaphoreLock to utterly violate its contract: it returns to > the caller without having acquired the semaphore, and without even telling > the caller so. Worse, if we *did* acquire the semaphore, we might still > exit via this path, since the placement of the condition check call > ignores the comment a few lines up: > > * Once we acquire the lock, we do NOT check for an interrupt before > * returning. The caller needs to be able to record ownership of the lock > * before any interrupt can be accepted. > > We could possibly fix all this with a redesigned API contract for > PGSemaphoreLock, but frankly I do not see a good reason to be tinkering > with it at all. We never needed to get it involved with deadlock check > handling, and I don't see why that needs to change for lock timeouts. > > One very good reason why monkeying with PGSemaphoreLock is wrong is that > on some platforms a SIGALRM interrupt won't interrupt the semop() call, > and thus control would never reach the "checker" anyway. If we're going > to throw an error, it must be thrown from the interrupt handler. Then please, explain to me, how on Earth can the current deadlock_timeout can report the error? Sure, I can see the PG_TRY() ... PG_END_TRY() block in lock.c but as far as I can see, nothing in the CheckDeadLock() -> DeadLockCheck() -> DeadLockCheckRecurse() path diverts the code to return to a different address from the signal handler, i.e. there is no elog(ERROR) or ereport(ERROR) even in the DS_HARD_DEADLOCK case, so nothing calls siglongjmp(). So logically the code shouldn't end up in the PG_CATCH() branch. semop() will only get an errno = EINTR when returning from the signal handler and would loop again. Then what makes it return beside being able to lock the semaphore? The conditions in ProcSleep() that e.g. print the lock stats work somehow. > The whole lmgrtimeout module seems to me to be far more mechanism than is > warranted, for too little added functionality. In the first place, there > is nothing on the horizon suggesting that we need to let any plug-in code > get control here, and if anything the delicacy of what's going on leads me > to not wish to expose such a possibility. In the second place, it isn't > adding any actually useful functionality, it's just agglomerating some > checks. The minimum thing I would want it to do is avoid calling > timeout.c multiple times, which is what would happen right now (leading > to four extra syscalls per lock acquisition, which is enough new overhead > to constitute a strong objection to committing this patch at all). > > On the whole I think we could forget lmgrtimeout and just hardwire the > lock timeout and deadlock check cases. But in any case we're going to > need support in timeout.c for enabling/disabling multiple timeouts at > once without extra setitimer calls. OK, so you prefer the previous hardcoding PGSemaphoreTimedLock() that makes every LockAcquire() check its return code and the detailed error message about failed to lock the given object? I will add new functions to timeout.c to remove many timeout sources at once to decrease the amount of syscalls needed. > I'm also not thrilled about the way in which the existing deadlock > checking code has been hacked up. As an example, you added this to > DeadLockReport(): > > + if (!DeadLockTimeoutCondition()) > + return; > > which again causes it to violate its contract, namely to report a > deadlock, in the most fundamental way -- existing callers aren't > expecting it to return *at all*. Existing caller*s*? There is only one caller. The reasoning behind this change was that if the code reaches ReportLmgrTimeoutError() in lock.c then at least one timeout triggered and one *Report function in the chain will do ereport(ERROR). *THAT* would trigger te siglongjmp() ending up in PG_CATCH(). The added lines in DeadLockReport() ensures that the deadlock error is not reported if it didn't trigger. > Surely we can decouple the deadlock > and lock timeout cases better than that; or at least if we can't it's > a delusion to propose anything like lmgrtimeout in the first place. > > There's considerable lack of attention to updating comments, too. > For instance in WaitOnLock you only bothered to update the comment > immediately adjacent to the changed code, and not the two comment > blocks above that, which both have specific references to deadlocks > being the reason for failure. > > Also, the "per statement" mode for lock timeout doesn't seem to be > any such thing, because it's implemented like this: > > + case LOCK_TIMEOUT_PER_STMT: > + enable_timeout_at(LOCK_TIMEOUT, > + TimestampTzPlusMilliseconds( > + GetCurrentStatementStartTimestamp(), > + LockTimeout)); > + break; > > That doesn't provide anything like "you can spend at most N milliseconds > waiting for locks during a statement". What it is is "if you happen to be > waiting for a lock N milliseconds after the statement starts, or if you > attempt to acquire any lock more than N milliseconds after the statement > starts, you lose instantly". I don't think that definition actually adds > any useful functionality compared to setting statement_timeout to N > milliseconds, and it's certainly wrongly documented. OK, I will add the code to "lose instantly" if the code happens to be executed after the specified deadline, this is another oversight on my side. Where can I get brown paper bags in large amounts? :-) But I certainly don't agree with you that the "per statement" variant of the lock timeout is nothing more than statement timeout. What happens if the statement doesn't spend its timeout i.e. in the lockless SELECT case or anything that has to lock but it can get all the locks under the specified time? statement_timeout can trigger the error while the server is already returning the data to the client and this is not useful. Thanks to the single-row processing mode introduced in 9.2, the client can at least get partial data and an error before the next row. I have certainly seen use cases that said: I only allow the statement X time to wait on locks but otherwise give me all data. > To do what the > documentation implies would require tracking and adding up the time spent > waiting for locks during a statement. Which might be a good thing to do, > especially if the required gettimeofday() calls could be shared with what > timeout.c probably has to do anyway at start and stop of a lock wait. I will look into it. > But this code doesn't do it. > > Lastly, I'm not sure where is the best place to be adding the control > logic for this, but I'm pretty sure postinit.c is not it. It oughta be > somewhere under storage/lmgr/, no? Indeed. Best regards, Zoltán Böszörményi -- ---------------------------------- Zoltán Böszörményi Cybertec Schönig & Schönig GmbH Gröhrmühlgasse 26 A-2700 Wiener Neustadt, Austria Web: http://www.postgresql-support.de http://www.postgresql.at/
Hi, 2012-09-22 20:49 keltezéssel, Tom Lane írta: > Boszormenyi Zoltan <zb@cybertec.at> writes: >> new version with a lot more cleanup is attached. > I looked at this patch, and frankly I'm rather dismayed. It's a mess. I hope you won't find this one a mess. I tried to address all your complaints. > [rather long diatribe on modifying PGSemaphoreLock improperly] I have returned to the previous version that used PGSemaphoreTimedLock and this time I save and restore errno around calling the timeout condition checker. > The whole lmgrtimeout module seems to me to be far more mechanism than is > warranted, for too little added functionality. [...] lmgrtimeout is no more, back to hardcoding things. > The minimum thing I would want it to do is avoid calling > timeout.c multiple times, which is what would happen right now (leading > to four extra syscalls per lock acquisition, which is enough new overhead > to constitute a strong objection to committing this patch at all). I have added enable_multiple_timeouts() and disable_multiple_timeouts() that minimize the number setitimer() calls. > There's considerable lack of attention to updating comments, too. > For instance in WaitOnLock you only bothered to update the comment > immediately adjacent to the changed code, and not the two comment > blocks above that, which both have specific references to deadlocks > being the reason for failure. I modified the comment in question. I hope the wording is right. > Also, the "per statement" mode for lock timeout doesn't seem to be > any such thing, because it's implemented like this: > > + case LOCK_TIMEOUT_PER_STMT: > + enable_timeout_at(LOCK_TIMEOUT, > + TimestampTzPlusMilliseconds( > + GetCurrentStatementStartTimestamp(), > + LockTimeout)); > + break; > > That doesn't provide anything like "you can spend at most N milliseconds > waiting for locks during a statement". What it is is "if you happen to be > waiting for a lock N milliseconds after the statement starts, or if you > attempt to acquire any lock more than N milliseconds after the statement > starts, you lose instantly". I don't think that definition actually adds > any useful functionality compared to setting statement_timeout to N > milliseconds, and it's certainly wrongly documented. To do what the > documentation implies would require tracking and adding up the time spent > waiting for locks during a statement. Which might be a good thing to do, > especially if the required gettimeofday() calls could be shared with what > timeout.c probably has to do anyway at start and stop of a lock wait. > But this code doesn't do it. The code now properly accumulates the time spent in waiting for LOCK_TIMEOUT. This means that if STATEMENT_TIMEOUT and LOCK_TIMEOUT are set to the same value, STATEMENT_TIMEOUT will trigger because it considers the time as one contiguous unit, LOCK_TIMEOUT only accounts the time spent in waiting, not the time spent with useful work. This means that LOCK_TIMEOUT doesn't need any special code in its handler function, it's a NOP. The relation between timeouts is only handled by the timeout.c module. > Lastly, I'm not sure where is the best place to be adding the control > logic for this, but I'm pretty sure postinit.c is not it. It oughta be > somewhere under storage/lmgr/, no? The above change means that there is no control logic outside of storage/lmgr now. I temporarily abandoned the idea of detailed error reporting on the object type and name/ID. WaitOnLock() reports "canceling statement due to lock timeout" and LockAcquire() kept its previous semantics. This can be quickly revived in case of demand, it would be another ~15K patch. I hope you can find another time slot in this CF to review this one. Best regards, Zoltán Böszörményi -- ---------------------------------- Zoltán Böszörményi Cybertec Schönig & Schönig GmbH Gröhrmühlgasse 26 A-2700 Wiener Neustadt, Austria Web: http://www.postgresql-support.de http://www.postgresql.at/
Attachment
Hi, this is the latest one, fixing a bug in the accounting of per-statement lock timeout handling and tweaking some comments. Best regards, Zoltán Böszörményi -- ---------------------------------- Zoltán Böszörményi Cybertec Schönig & Schönig GmbH Gröhrmühlgasse 26 A-2700 Wiener Neustadt, Austria Web: http://www.postgresql-support.de http://www.postgresql.at/
Attachment
Boszormenyi Zoltan escribió: > Hi, > > this is the latest one, fixing a bug in the accounting > of per-statement lock timeout handling and tweaking > some comments. Tom, are you able to give this patch some more time on this commitfest? (If not, I think it would be fair to boot it to CF3; this is final in a series, there's nothing that depends on it, and there's been good movement on it; there's plenty of time before the devel cycle closes.) -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Alvaro Herrera <alvherre@2ndquadrant.com> writes: > Boszormenyi Zoltan escribi�: >> this is the latest one, fixing a bug in the accounting >> of per-statement lock timeout handling and tweaking >> some comments. > Tom, are you able to give this patch some more time on this commitfest? I'm still hoping to get to it, but I've been spending a lot of time on bug fixing rather than patch review lately :-(. If you're hoping to close out the current CF soon, maybe we should just slip it to the next one. regards, tom lane
2012-10-18 20:08 keltezéssel, Tom Lane írta: > Alvaro Herrera <alvherre@2ndquadrant.com> writes: >> Boszormenyi Zoltan escribió: >>> this is the latest one, fixing a bug in the accounting >>> of per-statement lock timeout handling and tweaking >>> some comments. >> Tom, are you able to give this patch some more time on this commitfest? > I'm still hoping to get to it, but I've been spending a lot of time on > bug fixing rather than patch review lately :-(. If you're hoping to > close out the current CF soon, maybe we should just slip it to the next > one. Fine by me. Thanks. -- ---------------------------------- Zoltán Böszörményi Cybertec Schönig & Schönig GmbH Gröhrmühlgasse 26 A-2700 Wiener Neustadt, Austria Web: http://www.postgresql-support.de http://www.postgresql.at/
On 2012-10-18 22:40:15 +0200, Boszormenyi Zoltan wrote: > 2012-10-18 20:08 keltezéssel, Tom Lane írta: > >Alvaro Herrera <alvherre@2ndquadrant.com> writes: > >>Boszormenyi Zoltan escribió: > >>>this is the latest one, fixing a bug in the accounting > >>>of per-statement lock timeout handling and tweaking > >>>some comments. > >>Tom, are you able to give this patch some more time on this commitfest? > >I'm still hoping to get to it, but I've been spending a lot of time on > >bug fixing rather than patch review lately :-(. If you're hoping to > >close out the current CF soon, maybe we should just slip it to the next > >one. > > Fine by me. Thanks. According to this the current state of the patch should be "Ready for Committer" not "Needs Review" is that right? I changed the state for now... Greetings, Andres Freund --Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
2012-12-08 15:30 keltezéssel, Andres Freund írta: > On 2012-10-18 22:40:15 +0200, Boszormenyi Zoltan wrote: >> 2012-10-18 20:08 keltezéssel, Tom Lane írta: >>> Alvaro Herrera <alvherre@2ndquadrant.com> writes: >>>> Boszormenyi Zoltan escribió: >>>>> this is the latest one, fixing a bug in the accounting >>>>> of per-statement lock timeout handling and tweaking >>>>> some comments. >>>> Tom, are you able to give this patch some more time on this commitfest? >>> I'm still hoping to get to it, but I've been spending a lot of time on >>> bug fixing rather than patch review lately :-(. If you're hoping to >>> close out the current CF soon, maybe we should just slip it to the next >>> one. >> Fine by me. Thanks. > According to this the current state of the patch should be "Ready for > Committer" not "Needs Review" is that right? I changed the state for > now... Thanks. I just tried the patch on current GIT HEAD and gives some offset warnings but no rejects. Also, it compiles without warnings and still works as it should. Should I post a new patch that applies cleanly? Best regards, Zoltán Böszörményi -- ---------------------------------- Zoltán Böszörményi Cybertec Schönig & Schönig GmbH Gröhrmühlgasse 26 A-2700 Wiener Neustadt, Austria Web: http://www.postgresql-support.de http://www.postgresql.at/
Boszormenyi Zoltan <zb@cybertec.at> writes: > Thanks. I just tried the patch on current GIT HEAD and > gives some offset warnings but no rejects. Also, it compiles > without warnings and still works as it should. > Should I post a new patch that applies cleanly? Offsets are not a problem --- if you tried to keep them exact you'd just be making a lot of unnecessary updates. If there were fuzz warnings I'd be more concerned. regards, tom lane