Thread: pgsql: Add isolationtester spec for old heapam.c bug

pgsql: Add isolationtester spec for old heapam.c bug

From
Alvaro Herrera
Date:
Add isolationtester spec for old heapam.c bug

In 0e5680f4737a, I fixed a bug in heapam that caused spurious deadlocks
when multiple updates concurrently attempted to modify the old version
of an updated tuple whose new version was key-share locked.  I proposed
an isolationtester spec file that reproduced the bug, but back then
isolationtester wasn't mature enough to be able to run it.  Now that
38f8bdcac498 is in the tree, we can have this spec file too.

Discussion: https://www.postgresql.org/message-id/20141212205254.GC1768%40alvh.no-ip.org

Branch
------
master

Details
-------
http://git.postgresql.org/pg/commitdiff/c9578135f769072e2597b88402f256a398279c91

Modified Files
--------------
src/test/isolation/expected/tuplelock-update.out | 24 ++++++++++++++++++++
src/test/isolation/isolation_schedule            |  1 +
src/test/isolation/specs/tuplelock-update.spec   | 28 ++++++++++++++++++++++++
3 files changed, 53 insertions(+)


Re: pgsql: Add isolationtester spec for old heapam.c bug

From
Tom Lane
Date:
Alvaro Herrera <alvherre@alvh.no-ip.org> writes:
> Add isolationtester spec for old heapam.c bug

Hmmm ....

http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=spoonbill&dt=2016-02-27%2000%3A00%3A06

This failure looks a lot like the timing-related problems I was chasing
last week with deadlock-hard on the CLOBBER_CACHE_ALWAYS critters.
spoonbill isn't CLOBBER_CACHE_ALWAYS, but it uses some weird malloc debug
stuff that slows it down by similar orders of magnitude.  You seem to need
to think about how to make this test less timing-dependent.

            regards, tom lane


Re: pgsql: Add isolationtester spec for old heapam.c bug

From
Tom Lane
Date:
I wrote:
> Alvaro Herrera <alvherre@alvh.no-ip.org> writes:
>> Add isolationtester spec for old heapam.c bug

> Hmmm ....

> http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=spoonbill&dt=2016-02-27%2000%3A00%3A06

> This failure looks a lot like the timing-related problems I was chasing
> last week with deadlock-hard on the CLOBBER_CACHE_ALWAYS critters.
> spoonbill isn't CLOBBER_CACHE_ALWAYS, but it uses some weird malloc debug
> stuff that slows it down by similar orders of magnitude.  You seem to need
> to think about how to make this test less timing-dependent.

The plot thickens:

http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=guaibasaurus&dt=2016-02-29%2016%3A17%3A01

guaibasaurus is not a particularly slow machine, and it's not using any
special build flags AFAICT.  So I'm not sure what to make of this case,
except that it proves the timing problem can manifest on normal builds.

            regards, tom lane


Re: pgsql: Add isolationtester spec for old heapam.c bug

From
Alvaro Herrera
Date:
Tom Lane wrote:
> I wrote:
> > Alvaro Herrera <alvherre@alvh.no-ip.org> writes:
> >> Add isolationtester spec for old heapam.c bug
>
> > Hmmm ....
>
> > http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=spoonbill&dt=2016-02-27%2000%3A00%3A06
>
> > This failure looks a lot like the timing-related problems I was chasing
> > last week with deadlock-hard on the CLOBBER_CACHE_ALWAYS critters.
> > spoonbill isn't CLOBBER_CACHE_ALWAYS, but it uses some weird malloc debug
> > stuff that slows it down by similar orders of magnitude.  You seem to need
> > to think about how to make this test less timing-dependent.
>
> The plot thickens:
>
> http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=guaibasaurus&dt=2016-02-29%2016%3A17%3A01
>
> guaibasaurus is not a particularly slow machine, and it's not using any
> special build flags AFAICT.  So I'm not sure what to make of this case,
> except that it proves the timing problem can manifest on normal builds.

Hmm, I suppose I could fix this by using three different advisory locks
rather than a single one.  (My assumption is that the timing dependency
is the order in which the backends are awakened when the advisory lock
is released.)  I would release the locks one by one rather than all
together.

--
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: pgsql: Add isolationtester spec for old heapam.c bug

From
Tom Lane
Date:
Alvaro Herrera <alvherre@2ndquadrant.com> writes:
> Tom Lane wrote:
>> guaibasaurus is not a particularly slow machine, and it's not using any
>> special build flags AFAICT.  So I'm not sure what to make of this case,
>> except that it proves the timing problem can manifest on normal builds.

> Hmm, I suppose I could fix this by using three different advisory locks
> rather than a single one.  (My assumption is that the timing dependency
> is the order in which the backends are awakened when the advisory lock
> is released.)  I would release the locks one by one rather than all
> together.

Sounds plausible.  You would probably need several seconds' pg_sleep() in
between the lock releases to ensure that even on slow/overloaded machines,
there's enough time for all wakened backends to do what they're supposed
to do.

            regards, tom lane