Thread: pgsql: Add isolationtester spec for old heapam.c bug
Add isolationtester spec for old heapam.c bug In 0e5680f4737a, I fixed a bug in heapam that caused spurious deadlocks when multiple updates concurrently attempted to modify the old version of an updated tuple whose new version was key-share locked. I proposed an isolationtester spec file that reproduced the bug, but back then isolationtester wasn't mature enough to be able to run it. Now that 38f8bdcac498 is in the tree, we can have this spec file too. Discussion: https://www.postgresql.org/message-id/20141212205254.GC1768%40alvh.no-ip.org Branch ------ master Details ------- http://git.postgresql.org/pg/commitdiff/c9578135f769072e2597b88402f256a398279c91 Modified Files -------------- src/test/isolation/expected/tuplelock-update.out | 24 ++++++++++++++++++++ src/test/isolation/isolation_schedule | 1 + src/test/isolation/specs/tuplelock-update.spec | 28 ++++++++++++++++++++++++ 3 files changed, 53 insertions(+)
Alvaro Herrera <alvherre@alvh.no-ip.org> writes: > Add isolationtester spec for old heapam.c bug Hmmm .... http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=spoonbill&dt=2016-02-27%2000%3A00%3A06 This failure looks a lot like the timing-related problems I was chasing last week with deadlock-hard on the CLOBBER_CACHE_ALWAYS critters. spoonbill isn't CLOBBER_CACHE_ALWAYS, but it uses some weird malloc debug stuff that slows it down by similar orders of magnitude. You seem to need to think about how to make this test less timing-dependent. regards, tom lane
I wrote: > Alvaro Herrera <alvherre@alvh.no-ip.org> writes: >> Add isolationtester spec for old heapam.c bug > Hmmm .... > http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=spoonbill&dt=2016-02-27%2000%3A00%3A06 > This failure looks a lot like the timing-related problems I was chasing > last week with deadlock-hard on the CLOBBER_CACHE_ALWAYS critters. > spoonbill isn't CLOBBER_CACHE_ALWAYS, but it uses some weird malloc debug > stuff that slows it down by similar orders of magnitude. You seem to need > to think about how to make this test less timing-dependent. The plot thickens: http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=guaibasaurus&dt=2016-02-29%2016%3A17%3A01 guaibasaurus is not a particularly slow machine, and it's not using any special build flags AFAICT. So I'm not sure what to make of this case, except that it proves the timing problem can manifest on normal builds. regards, tom lane
Tom Lane wrote: > I wrote: > > Alvaro Herrera <alvherre@alvh.no-ip.org> writes: > >> Add isolationtester spec for old heapam.c bug > > > Hmmm .... > > > http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=spoonbill&dt=2016-02-27%2000%3A00%3A06 > > > This failure looks a lot like the timing-related problems I was chasing > > last week with deadlock-hard on the CLOBBER_CACHE_ALWAYS critters. > > spoonbill isn't CLOBBER_CACHE_ALWAYS, but it uses some weird malloc debug > > stuff that slows it down by similar orders of magnitude. You seem to need > > to think about how to make this test less timing-dependent. > > The plot thickens: > > http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=guaibasaurus&dt=2016-02-29%2016%3A17%3A01 > > guaibasaurus is not a particularly slow machine, and it's not using any > special build flags AFAICT. So I'm not sure what to make of this case, > except that it proves the timing problem can manifest on normal builds. Hmm, I suppose I could fix this by using three different advisory locks rather than a single one. (My assumption is that the timing dependency is the order in which the backends are awakened when the advisory lock is released.) I would release the locks one by one rather than all together. -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Alvaro Herrera <alvherre@2ndquadrant.com> writes: > Tom Lane wrote: >> guaibasaurus is not a particularly slow machine, and it's not using any >> special build flags AFAICT. So I'm not sure what to make of this case, >> except that it proves the timing problem can manifest on normal builds. > Hmm, I suppose I could fix this by using three different advisory locks > rather than a single one. (My assumption is that the timing dependency > is the order in which the backends are awakened when the advisory lock > is released.) I would release the locks one by one rather than all > together. Sounds plausible. You would probably need several seconds' pg_sleep() in between the lock releases to ensure that even on slow/overloaded machines, there's enough time for all wakened backends to do what they're supposed to do. regards, tom lane