Thread: Adding a test for speculative insert abort case

Adding a test for speculative insert abort case

From

Melanie Plageman

Date:

01 May 2019, 00:15:55

Today, while poking around the table_complete_speculative code which Ashwin
mentioned in [1], we were trying to understand when exactly we would complete a
speculative insert by aborting.

We added a logging message to heapam_tuple_complete_speculative before it calls
heap_abort_speculative and ran the regression and isolation tests to see what
test failed so we knew how to exercise this codepath.
No tests failed, so we spent some time trying to understand when 'succeeded'
would be true coming into heap_tuple_complete_speculative.

Eventually, we figured out that if one transaction speculatively inserts a
tuple into a table with a unique index and then pauses before inserting the
value into the index, and while it is paused, another transaction successfully
inserts a value which would conflict with that value, it would result in an
aborted speculative insertion.

t1(id,val)
unique index t1(id)

s1: insert into t1 values(1, 'someval') on conflict(id) do update set val = 'someotherval';
s1: pause in ExecInsert before calling ExecInsertIndexTuples
s2: insert into t1 values(1, 'someval');
s2: continue

We don't know of a way to add this scenario to the current isolation framework.

Can anyone think of a good way to put this codepath under test?

- Melanie & Ashwin

[1] https://www.postgresql.org/message-id/CALfoeitk7-TACwYv3hCw45FNPjkA86RfXg4iQ5kAOPhR%2BF1Y4w%40mail.gmail.com

Re: Adding a test for speculative insert abort case

From

Andres Freund

Date:

01 May 2019, 00:22:23

Hi,

On 2019-04-30 17:15:55 -0700, Melanie Plageman wrote:
> Today, while poking around the table_complete_speculative code which Ashwin
> mentioned in [1], we were trying to understand when exactly we would
> complete a
> speculative insert by aborting.

(FWIW, it's on my todo queue to look at this)


> We added a logging message to heapam_tuple_complete_speculative before it
> calls
> heap_abort_speculative and ran the regression and isolation tests to see
> what
> test failed so we knew how to exercise this codepath.
> No tests failed, so we spent some time trying to understand when 'succeeded'
> would be true coming into heap_tuple_complete_speculative.
> 
> Eventually, we figured out that if one transaction speculatively inserts a
> tuple into a table with a unique index and then pauses before inserting the
> value into the index, and while it is paused, another transaction
> successfully
> inserts a value which would conflict with that value, it would result in an
> aborted speculative insertion.
> 
> t1(id,val)
> unique index t1(id)
> 
> s1: insert into t1 values(1, 'someval') on conflict(id) do update set val =
> 'someotherval';
> s1: pause in ExecInsert before calling ExecInsertIndexTuples
> s2: insert into t1 values(1, 'someval');
> s2: continue
> 
> We don't know of a way to add this scenario to the current isolation
> framework.

> Can anyone think of a good way to put this codepath under test?

Not easily so - that's why the ON CONFLICT patch didn't add code
coverage for it :(. I wonder if you could whip something up by having
another non-unique expression index, where the expression acquires a
advisory lock? If that advisory lock where previously acquired by
another session, that should allow to write a reliable isolation test?

Alternatively, as a fallback, there's a short pgbench test, I wonder if we
could just adapt that to use ON CONFLICT UPDATE?

Greetings,

Andres Freund

Re: Adding a test for speculative insert abort case

From

Melanie Plageman

Date:

01 May 2019, 01:34:42

On Tue, Apr 30, 2019 at 5:22 PM Andres Freund <andres@anarazel.de> wrote:

Not easily so - that's why the ON CONFLICT patch didn't add code
coverage for it :(. I wonder if you could whip something up by having
another non-unique expression index, where the expression acquires a
advisory lock? If that advisory lock where previously acquired by
another session, that should allow to write a reliable isolation test?

So, I took a look at one of the existing tests that does something like what
you mentioned and tried the following:
----------
create table t1(key int, val text);
create unique index t1_uniq_idx on t1(key);
create or replace function t1_lock_func(int) returns int immutable language sql AS
'select pg_advisory_xact_lock_shared(1); select $1';
create index t1_lock_idx ON t1(t1_lock_func(key));
----------
s1:
begin isolation level read committed;
insert into t1 values(1, 'someval');
s2:
set default_transaction_isolation = 'read committed';
insert into t1 values(1, 'anyval') on conflict(key) do update set val = 'updatedval';
----------

So, the above doesn't work because s2 waits to acquire the lock in the first
phase of the speculative insert -- when it is just checking the index, before
inserting to the table and before inserting to the index.

Then when the s1 is committed, we won't execute the speculative insert code at
all and will go into ExecOnConflictUpdate instead.

Maybe I just need a different kind of advisory lock to allow
ExecCheckIndexConstraints to be able to check the index here. I figured it is a
read operation, so a shared advisory lock should be okay, but it seems like it
is not okay

Without knowing any of the context, on an initial pass of debugging, I did
notice that, in the initial check of the index by s2, XactLockTableWait is
called with reason_wait as XLTW_InsertIndex (even though we are just trying to
check it, so maybe it knows our intentions:))

Is there something I can do in the test to allow my check to go
through but the insert to have to wait?

Melanie Plageman

Re: Adding a test for speculative insert abort case

From

Peter Geoghegan

Date:

01 May 2019, 01:43:08

On Tue, Apr 30, 2019 at 5:16 PM Melanie Plageman
<melanieplageman@gmail.com> wrote:
> Can anyone think of a good way to put this codepath under test?

During the initial development of ON CONFLICT, speculative insertion
itself was tested using custom stress testing that you can still get
here:

https://github.com/petergeoghegan/jjanes_upsert

I'm not sure that this is something that you can adopt, but I
certainly found it very useful at the time. It tests whether or not
there is agreement among concurrent speculative inserters, and whether
or not there are "unprincipled deadlocks" (user hostile deadlocks that
cannot be fixed by reordering something in application code).

-- 
Peter Geoghegan

Re: Adding a test for speculative insert abort case

From

Andres Freund

Date:

01 May 2019, 02:09:51

Hi,

On April 30, 2019 6:43:08 PM PDT, Peter Geoghegan <pg@bowt.ie> wrote:
>On Tue, Apr 30, 2019 at 5:16 PM Melanie Plageman
><melanieplageman@gmail.com> wrote:
>> Can anyone think of a good way to put this codepath under test?
>
>During the initial development of ON CONFLICT, speculative insertion
>itself was tested using custom stress testing that you can still get
>here:
>
>https://github.com/petergeoghegan/jjanes_upsert
>
>I'm not sure that this is something that you can adopt, but I
>certainly found it very useful at the time. It tests whether or not
>there is agreement among concurrent speculative inserters, and whether
>or not there are "unprincipled deadlocks" (user hostile deadlocks that
>cannot be fixed by reordering something in application code).

I think we want a deterministic case. I recall asking for that back then...

Andres
--
Sent from my Android device with K-9 Mail. Please excuse my brevity.

Re: Adding a test for speculative insert abort case

From

Thomas Munro

Date:

01 May 2019, 02:13:46

On Wed, May 1, 2019 at 12:16 PM Melanie Plageman
<melanieplageman@gmail.com> wrote:
> s1: insert into t1 values(1, 'someval') on conflict(id) do update set val = 'someotherval';
> s1: pause in ExecInsert before calling ExecInsertIndexTuples
> s2: insert into t1 values(1, 'someval');
> s2: continue
>
> We don't know of a way to add this scenario to the current isolation framework.
>
> Can anyone think of a good way to put this codepath under test?

Hi Melanie,

I think it'd be nice to have a set of macros that can create wait
points in the C code that isolation tests can control, in a special
build.  Perhaps there could be shm hash table of named wait points in
shared memory; if DEBUG_WAIT_POINT("foo") finds that "foo" is not
present, it continues, but if it finds an entry it waits for it to go
away.  Then isolation tests could add/remove names and signal a
condition variable to release waiters.

I contemplated that while working on SKIP LOCKED, which had a bunch of
weird edge cases that I tested by inserting throw-away wait-point code
like this:

https://www.postgresql.org/message-id/CADLWmXXss83oiYD0pn_SfQfg%2ByNEpPbPvgDb8w6Fh--jScSybA%40mail.gmail.com

-- 
Thomas Munro
https://enterprisedb.com

Re: Adding a test for speculative insert abort case

From

Melanie Plageman

Date:

01 May 2019, 18:18:57

On Tue, Apr 30, 2019 at 7:14 PM Thomas Munro <thomas.munro@gmail.com> wrote:

I think it'd be nice to have a set of macros that can create wait
points in the C code that isolation tests can control, in a special
build. Perhaps there could be shm hash table of named wait points in
shared memory; if DEBUG_WAIT_POINT("foo") finds that "foo" is not
present, it continues, but if it finds an entry it waits for it to go
away. Then isolation tests could add/remove names and signal a
condition variable to release waiters.

I contemplated that while working on SKIP LOCKED, which had a bunch of
weird edge cases that I tested by inserting throw-away wait-point code
like this:

https://www.postgresql.org/message-id/CADLWmXXss83oiYD0pn_SfQfg%2ByNEpPbPvgDb8w6Fh--jScSybA%40mail.gmail.com

Yes, I agree it would be nice to have a framework like this.

Greenplum actually has a fault injection framework that, I believe, works
similarly to what you are describing -- i.e. sets a variable in shared memory.
There is an extension, gp_inject_fault, which allows you to set the faults.
Andreas Scherbaum wrote a blog post about how to use it [1].

The Greenplum implementation is not documented particularly well in the code,
but, it is something that folks working on Greenplum have talked about modifying
and proposing to Postgres.

[1] http://engineering.pivotal.io/post/testing_greenplum_database_using_fault_injection/

Melanie Plageman

Re: Adding a test for speculative insert abort case

From

Andres Freund

Date:

01 May 2019, 18:41:48

Hi,

On 2019-04-30 18:34:42 -0700, Melanie Plageman wrote:
> On Tue, Apr 30, 2019 at 5:22 PM Andres Freund <andres@anarazel.de> wrote:
>
> >
> > Not easily so - that's why the ON CONFLICT patch didn't add code
> > coverage for it :(. I wonder if you could whip something up by having
> > another non-unique expression index, where the expression acquires a
> > advisory lock? If that advisory lock where previously acquired by
> > another session, that should allow to write a reliable isolation test?
> >
> >
> So, I took a look at one of the existing tests that does something like what
> you mentioned and tried the following:
> ----------
> create table t1(key int, val text);
> create unique index t1_uniq_idx on t1(key);
> create or replace function t1_lock_func(int) returns int immutable language
> sql AS
> 'select pg_advisory_xact_lock_shared(1); select $1';
> create index t1_lock_idx ON t1(t1_lock_func(key));
> ----------
> s1:
> begin isolation level read committed;
> insert into t1 values(1, 'someval');
> s2:
> set default_transaction_isolation = 'read committed';
> insert into t1 values(1, 'anyval') on conflict(key) do update set val =
> 'updatedval';
> ----------
>
> So, the above doesn't work because s2 waits to acquire the lock in the first
> phase of the speculative insert -- when it is just checking the index,
> before
> inserting to the table and before inserting to the index.

Couldn't that be addressed by having t1_lock_func() acquire two locks?
One for blocking during the initial index probe, and one for the
speculative insertion?

I'm imagining something like

if (pg_try_advisory_xact_lock(1))
    pg_advisory_xact_lock(2);
else
    pg_advisory_xact_lock(1);

in t1_lock_func. If you then make the session something roughly like

s1: pg_advisory_xact_lock(1);
s1: pg_advisory_xact_lock(2);

s2: upsert t1 <blocking for 1>
s1: pg_advisory_xact_unlock(1);
s2: <continuing>
s2: <blocking for 2>
s1: insert into t1 values(1, 'someval');
s1: pg_advisory_xact_unlock(2);
s2: <continuing>
s2: spec-conflict

Greetings,

Andres Freund

Re: Adding a test for speculative insert abort case

From

Andres Freund

Date:

10 May 2019, 21:40:38

Hi,

On 2019-05-01 11:41:48 -0700, Andres Freund wrote:
> I'm imagining something like
> 
> if (pg_try_advisory_xact_lock(1))
>     pg_advisory_xact_lock(2);
> else
>     pg_advisory_xact_lock(1);
> 
> in t1_lock_func. If you then make the session something roughly like
> 
> s1: pg_advisory_xact_lock(1);
> s1: pg_advisory_xact_lock(2);
> 
> s2: upsert t1 <blocking for 1>
> s1: pg_advisory_xact_unlock(1);
> s2: <continuing>
> s2: <blocking for 2>
> s1: insert into t1 values(1, 'someval');
> s1: pg_advisory_xact_unlock(2);
> s2: <continuing>
> s2: spec-conflict

Needed to be slightly more complicated than that, but not that much. See
the attached test.  What do you think?

I think we should apply something like this (minus the WARNING, of
course). It's a bit complicated, but it seems worth covering this
special case.

Greetings,

Andres Freund

Attachment

speculative-conflict.diff

Re: Adding a test for speculative insert abort case

From

Andres Freund

Date:

14 May 2019, 19:19:14

Hi,

On 2019-05-10 14:40:38 -0700, Andres Freund wrote:
> On 2019-05-01 11:41:48 -0700, Andres Freund wrote:
> > I'm imagining something like
> > 
> > if (pg_try_advisory_xact_lock(1))
> >     pg_advisory_xact_lock(2);
> > else
> >     pg_advisory_xact_lock(1);
> > 
> > in t1_lock_func. If you then make the session something roughly like
> > 
> > s1: pg_advisory_xact_lock(1);
> > s1: pg_advisory_xact_lock(2);
> > 
> > s2: upsert t1 <blocking for 1>
> > s1: pg_advisory_xact_unlock(1);
> > s2: <continuing>
> > s2: <blocking for 2>
> > s1: insert into t1 values(1, 'someval');
> > s1: pg_advisory_xact_unlock(2);
> > s2: <continuing>
> > s2: spec-conflict
> 
> Needed to be slightly more complicated than that, but not that much. See
> the attached test.  What do you think?
> 
> I think we should apply something like this (minus the WARNING, of
> course). It's a bit complicated, but it seems worth covering this
> special case.

And pushed. Let's see what the buildfarm says.

Regards,

Andres

Re: Adding a test for speculative insert abort case

From

Melanie Plageman

Date:

16 May 2019, 01:34:15

On Tue, May 14, 2019 at 12:19 PM Andres Freund <andres@anarazel.de> wrote:

Hi,

On 2019-05-10 14:40:38 -0700, Andres Freund wrote:
> On 2019-05-01 11:41:48 -0700, Andres Freund wrote:
> > I'm imagining something like
> >
> > if (pg_try_advisory_xact_lock(1))
> > pg_advisory_xact_lock(2);
> > else
> > pg_advisory_xact_lock(1);
> >
> > in t1_lock_func. If you then make the session something roughly like
> >
> > s1: pg_advisory_xact_lock(1);
> > s1: pg_advisory_xact_lock(2);
> >
> > s2: upsert t1 <blocking for 1>
> > s1: pg_advisory_xact_unlock(1);
> > s2: <continuing>
> > s2: <blocking for 2>
> > s1: insert into t1 values(1, 'someval');
> > s1: pg_advisory_xact_unlock(2);
> > s2: <continuing>
> > s2: spec-conflict
>
> Needed to be slightly more complicated than that, but not that much. See
> the attached test. What do you think?
>
> I think we should apply something like this (minus the WARNING, of
> course). It's a bit complicated, but it seems worth covering this
> special case.

And pushed. Let's see what the buildfarm says.

Regards,

Andres

So, I recognize this has already been merged. However, after reviewing the test,

I believe there is a typo in the second permutation.

# Test that speculative locks are correctly acquired and released, s2
# inserts, s1 updates.

I think you meant

# Test that speculative locks are correctly acquired and released, s1
# inserts, s2 updates.

Though, I'm actually not sure how this permutation is exercising different
code than the first permutation.

Also, it would make the test easier to understand for me if, for instances of the
word "lock" in the test description and comments, you specified locktype -- e.g.
advisory lock.
I got confused between the speculative lock, the object locks on the index which
are required for probing or inserting into the index, and the advisory locks.

Below is a potential re-wording of one of the permutations that is more explicit
and more clear to me as a reader.

# Test that speculative locks are correctly acquired and released, s2
# inserts, s1 updates.
permutation
# acquire a number of advisory locks, to control execution flow - the
# blurt_and_lock function acquires advisory locks that allow us to
# continue after a) the optimistic conflict probe b) after the
# insertion of the speculative tuple.

"controller_locks"
"controller_show"
# Both sessions will be waiting on advisory locks
"s1_upsert" "s2_upsert"
"controller_show"
# Switch both sessions to wait on the other advisory lock next time
"controller_unlock_1_1" "controller_unlock_2_1"
# Allow both sessions to do the optimistic conflict probe and do the
# speculative insertion into the table
# They will then be waiting on another advisory lock when they attempt to
# update the index
"controller_unlock_1_3" "controller_unlock_2_3"
"controller_show"
# Allow the second session to finish insertion (complete speculative)
"controller_unlock_2_2"
# This should now show a successful insertion
"controller_show"
# Allow the first session to finish insertion (abort speculative)
"controller_unlock_1_2"
# This should now show a successful UPSERT
"controller_show"

I was also wondering: Is it possible that one of the "controller_unlock_*"
functions will get called before the session with the upsert has had a chance to
move forward in its progress and be waiting on that lock?
That is, given that we don't check that the sessions are waiting on the locks
before unlocking them, is there a race condition?

I noticed that there is not a test case which would cover the speculative wait
codepath. This seems much more challenging, however, it does seem like a
worthwhile test to have.

--

Melanie Plageman

Re: Adding a test for speculative insert abort case

From

Andres Freund

Date:

16 May 2019, 01:50:50

Hi,

On 2019-05-15 18:34:15 -0700, Melanie Plageman wrote:
> So, I recognize this has already been merged. However, after reviewing the
> test,
> I believe there is a typo in the second permutation.
> 
> # Test that speculative locks are correctly acquired and released, s2
> # inserts, s1 updates.
> 
> I think you meant
> 
> # Test that speculative locks are correctly acquired and released, s1
> # inserts, s2 updates.

Hm, yea.


> Though, I'm actually not sure how this permutation is exercising differen.
> code than the first permutation.

I was basically just trying to make sure that there's a sensible result
independent of which transaction "wins", while keeping the command-start
order the same. Probably not crucial, but seemed like a reasonable
addition.


> Also, it would make the test easier to understand for me if, for instances
> of the
> word "lock" in the test description and comments, you specified locktype --
> e.g.
> advisory lock.
> I got confused between the speculative lock, the object locks on the index
> which
> are required for probing or inserting into the index, and the advisory
> locks.
> 
> Below is a potential re-wording of one of the permutations that is more
> explicit
> and more clear to me as a reader.

Minor gripe: For the future, it's easier to such changes as a patch as
well - otherwise others need to move it to the file and diff it to
comment on the changes.


> # Test that speculative locks are correctly acquired and released, s2
> # inserts, s1 updates.
> permutation
>    # acquire a number of advisory locks, to control execution flow - the
>    # blurt_and_lock function acquires advisory locks that allow us to
>    # continue after a) the optimistic conflict probe b) after the
>    # insertion of the speculative tuple.
> 
>    "controller_locks"
>    "controller_show"
>    # Both sessions will be waiting on advisory locks
>    "s1_upsert" "s2_upsert"
>    "controller_show"
>    # Switch both sessions to wait on the other advisory lock next time
>    "controller_unlock_1_1" "controller_unlock_2_1"
>    # Allow both sessions to do the optimistic conflict probe and do the
>    # speculative insertion into the table
>    # They will then be waiting on another advisory lock when they attempt to
>    # update the index
>    "controller_unlock_1_3" "controller_unlock_2_3"
>    "controller_show"
>    # Allow the second session to finish insertion (complete speculative)
>    "controller_unlock_2_2"
>    # This should now show a successful insertion
>    "controller_show"
>    # Allow the first session to finish insertion (abort speculative)
>    "controller_unlock_1_2"
>    # This should now show a successful UPSERT
>    "controller_show"


> I was also wondering: Is it possible that one of the
> "controller_unlock_*" functions will get called before the session
> with the upsert has had a chance to move forward in its progress and
> be waiting on that lock?  That is, given that we don't check that the
> sessions are waiting on the locks before unlocking them, is there a
> race condition?

Isolationtester only switches between commands when either the command
finished, or once it's know to be waiting for a lock. Therefore I don't
think this race exists?  That logic is in the if (flags & STEP_NONBLOCK)
block in isolationtester.c:try_complete_step().

Does that make sense? Or did I misunderstand your concern?


> I noticed that there is not a test case which would cover the speculative
> wait
> codepath. This seems much more challenging, however, it does seem like a
> worthwhile test to have.

Shouldn't be that hard to create, I think. I think acquiring another
lock in a second, non-unique, expression index, ought to do the trick?
It probably has to be created after the unique index (so it's later in
the

Greetings,

Andres Freund

Re: Adding a test for speculative insert abort case

From

Melanie Plageman

Date:

16 May 2019, 03:35:49

On Wed, May 15, 2019 at 6:50 PM Andres Freund <andres@anarazel.de> wrote:

> Also, it would make the test easier to understand for me if, for instances
> of the
> word "lock" in the test description and comments, you specified locktype --
> e.g.
> advisory lock.
> I got confused between the speculative lock, the object locks on the index
> which
> are required for probing or inserting into the index, and the advisory
> locks.
>
> Below is a potential re-wording of one of the permutations that is more
> explicit
> and more clear to me as a reader.

Minor gripe: For the future, it's easier to such changes as a patch as
well - otherwise others need to move it to the file and diff it to
comment on the changes.

Will do--attached, though the wording is a rough suggestion.

> I was also wondering: Is it possible that one of the
> "controller_unlock_*" functions will get called before the session
> with the upsert has had a chance to move forward in its progress and
> be waiting on that lock? That is, given that we don't check that the
> sessions are waiting on the locks before unlocking them, is there a
> race condition?

Isolationtester only switches between commands when either the command
finished, or once it's know to be waiting for a lock. Therefore I don't
think this race exists? That logic is in the if (flags & STEP_NONBLOCK)
block in isolationtester.c:try_complete_step().

Does that make sense? Or did I misunderstand your concern?

I see. I didn't know what the blocking/waiting logic was in the isolation
framework. Nevermind, then.

> I noticed that there is not a test case which would cover the speculative
> wait
> codepath. This seems much more challenging, however, it does seem like a
> worthwhile test to have.

Shouldn't be that hard to create, I think. I think acquiring another
lock in a second, non-unique, expression index, ought to do the trick?
It probably has to be created after the unique index (so it's later in
the

I would think that the sequence would be s1 and s2 probe the index, s1 and s2
insert into the table, s1 updates the index but does not complete the
speculative insert and clear the token (pause before
table_complete_speculative). s2 is in speculative wait when attempting to update
the index.

Something like

permutation
"controller_locks"
"controller_show"
"s1_upsert" "s2_upsert"
"controller_show"
"controller_unlock_1_1" "controller_unlock_2_1"
"controller_unlock_1_3" "controller_unlock_2_3"
"controller_unlock_1_2"
"s1_magically_pause_before_complete_speculative"
# put s2 in speculative wait
"controller_unlock_2_2"
"s1_magically_unpause_before_complete_speculative"

So, how would another lock on another index keep s1 from clearing the
speculative token after it has updated the index?

Melanie Plageman

Attachment

specconflict_permutation_comment_update.patch

Re: Adding a test for speculative insert abort case

From

Andres Freund

Date:

16 May 2019, 03:46:37

Hi,

On 2019-05-15 20:35:49 -0700, Melanie Plageman wrote:
> > > I noticed that there is not a test case which would cover the speculative
> > > wait
> > > codepath. This seems much more challenging, however, it does seem like a
> > > worthwhile test to have.
> >
> > Shouldn't be that hard to create, I think. I think acquiring another
> > lock in a second, non-unique, expression index, ought to do the trick?
> > It probably has to be created after the unique index (so it's later in
> > the
> >
> >
> I would think that the sequence would be s1 and s2 probe the index, s1 and
> s2
> insert into the table, s1 updates the index but does not complete the
> speculative insert and clear the token (pause before
> table_complete_speculative). s2 is in speculative wait when attempting to
> update
> the index.
> 
> Something like
> 
> permutation
>    "controller_locks"
>    "controller_show"
>    "s1_upsert" "s2_upsert"
>    "controller_show"
>    "controller_unlock_1_1" "controller_unlock_2_1"
>    "controller_unlock_1_3" "controller_unlock_2_3"
>    "controller_unlock_1_2"
>    "s1_magically_pause_before_complete_speculative"
>    # put s2 in speculative wait
>    "controller_unlock_2_2"
>    "s1_magically_unpause_before_complete_speculative"
> 
> So, how would another lock on another index keep s1 from clearing the
> speculative token after it has updated the index?

If there were a second index on upserttest, something like CREATE INDEX
ON upserttest((blurt_and_lock2(key))); and blurt_and_lock2 acquired a
lock on (current_setting('spec.session')::int, 4), ISTM you could cause
a block to happen after the first index (the unique one, used for ON
CONFLICT) successfully created the index entry, but before
complete_speculative is called.  Shouldn't that fulfil your
s1_magically_pause_before_complete_speculative goal? The
controller_locks would only acquire the (1, 4) lock, thereby *not*
blocking s2 (or you could just release the lock in a separate step).

Greetings,

Andres Freund

Re: Adding a test for speculative insert abort case

From

Ashwin Agrawal

Date:

16 May 2019, 05:32:34

On Wed, May 15, 2019 at 8:36 PM Melanie Plageman <melanieplageman@gmail.com> wrote:

On Wed, May 15, 2019 at 6:50 PM Andres Freund <andres@anarazel.de> wrote:

> I noticed that there is not a test case which would cover the speculative
> wait
> codepath. This seems much more challenging, however, it does seem like a
> worthwhile test to have.

Shouldn't be that hard to create, I think. I think acquiring another
lock in a second, non-unique, expression index, ought to do the trick?
It probably has to be created after the unique index (so it's later in
the

I would think that the sequence would be s1 and s2 probe the index, s1 and s2
insert into the table, s1 updates the index but does not complete the
speculative insert and clear the token (pause before
table_complete_speculative). s2 is in speculative wait when attempting to update
the index.

Something like

permutation
"controller_locks"
"controller_show"
"s1_upsert" "s2_upsert"
"controller_show"
"controller_unlock_1_1" "controller_unlock_2_1"
"controller_unlock_1_3" "controller_unlock_2_3"
"controller_unlock_1_2"
"s1_magically_pause_before_complete_speculative"
# put s2 in speculative wait
"controller_unlock_2_2"
"s1_magically_unpause_before_complete_speculative"

So, how would another lock on another index keep s1 from clearing the
speculative token after it has updated the index?

The second index would help to hold the session after inserting the tuple in unique index but before completing the speculative insert. Hence, helps to create the condition easily. I believe order of index insertion is helping here that unique index is inserted and then non-unique index is inserted too.

Attaching patch with the test using the idea Andres mentioned and it works to excercise the speculative wait.

Attachment

0001-Add-test-to-validate-speculative-wait-is-performed.patch

Re: Adding a test for speculative insert abort case

From

Melanie Plageman

Date:

16 May 2019, 20:59:47

On Wed, May 15, 2019 at 10:32 PM Ashwin Agrawal <aagrawal@pivotal.io> wrote:

The second index would help to hold the session after inserting the tuple in unique index but before completing the speculative insert. Hence, helps to create the condition easily. I believe order of index insertion is helping here that unique index is inserted and then non-unique index is inserted too.

Oh, cool. I didn't know that execution order would be guaranteed for which index
to insert into first.

Attaching patch with the test using the idea Andres mentioned and it works to excercise the speculative wait.

It looks good.
I thought it would be helpful to mention why you have s1 create the non-unique
index after the permutation has begun. You don't want this index to influence
the behavior of the other permutations--this part makes sense. However, why have
s1 do it instead of the controller?

I added a couple suggested changes to the comments in the permutation in the
version in the patch I attached. Note that I did not update the answer files.
(These suggested changes to comments are in distinct from and would be in
addition to the suggestions I had for the wording of the comments overall in the
above email I sent).

Melanie Plageman

Attachment

0002-Add-test-to-validate-speculative-wait-is-performed.patch

Re: Adding a test for speculative insert abort case

From

Andres Freund

Date:

16 May 2019, 21:03:41

Hi,

On 2019-05-16 13:59:47 -0700, Melanie Plageman wrote:
> On Wed, May 15, 2019 at 10:32 PM Ashwin Agrawal <aagrawal@pivotal.io> wrote:
> 
> >
> > The second index would help to hold the session after inserting the tuple
> > in unique index but before completing the speculative insert. Hence, helps
> > to create the condition easily. I believe order of index insertion is
> > helping here that unique index is inserted and then non-unique index is
> > inserted too.
> >
> >
> Oh, cool. I didn't know that execution order would be guaranteed for which
> index
> to insert into first.

It's not *strictly* speaking *always* well defined. The list of indexes
is sorted by the oid of the index - so once created, it's
consistent. But when the oid assignment wraps around, it'd be the other
way around. But I think it's ok to disregard that - it'll never happen
in regression tests run against a new cluster, and you'd have to run
tests against an installed cluster for a *LONG* time for a *tiny* window
where the wraparound would happen precisely between the creation of the
two indexes.

Makes sense?

I guess we could make that case a tiny bit easier to diagnose in the
extremely unlikely case it happens by having a step that outputs
SELECT 'index_a'::regclass::int8 < 'index_b'::regclass::int8;

Greetings,

Andres Freund

Re: Adding a test for speculative insert abort case

From

Melanie Plageman

Date:

17 May 2019, 03:46:11

On Thu, May 16, 2019 at 2:03 PM Andres Freund <andres@anarazel.de> wrote:

Hi,

On 2019-05-16 13:59:47 -0700, Melanie Plageman wrote:
> On Wed, May 15, 2019 at 10:32 PM Ashwin Agrawal <aagrawal@pivotal.io> wrote:
>
> >
> > The second index would help to hold the session after inserting the tuple
> > in unique index but before completing the speculative insert. Hence, helps
> > to create the condition easily. I believe order of index insertion is
> > helping here that unique index is inserted and then non-unique index is
> > inserted too.
> >
> >
> Oh, cool. I didn't know that execution order would be guaranteed for which
> index
> to insert into first.

It's not *strictly* speaking *always* well defined. The list of indexes
is sorted by the oid of the index - so once created, it's
consistent. But when the oid assignment wraps around, it'd be the other
way around. But I think it's ok to disregard that - it'll never happen
in regression tests run against a new cluster, and you'd have to run
tests against an installed cluster for a *LONG* time for a *tiny* window
where the wraparound would happen precisely between the creation of the
two indexes.

Makes sense?

Yep, thanks.

I guess we could make that case a tiny bit easier to diagnose in the
extremely unlikely case it happens by having a step that outputs
SELECT 'index_a'::regclass::int8 < 'index_b'::regclass::int8;

Good idea.

I squashed the changes I suggested in previous emails, Ashwin's patch, my
suggested updates to that patch, and the index order check all into one updated
patch attached.

Melanie Plageman

Attachment

0003-Add-test-to-validate-speculative-wait-is-performed.patch

Re: Adding a test for speculative insert abort case

From

Melanie Plageman

Date:

17 May 2019, 16:23:53

On Thu, May 16, 2019 at 8:46 PM Melanie Plageman <melanieplageman@gmail.com> wrote:

I squashed the changes I suggested in previous emails, Ashwin's patch, my
suggested updates to that patch, and the index order check all into one updated
patch attached.

I realized that the numbers at the front probably indicate which patch it is in
a patch set and not the version, so, if that is the case, a renamed patch --
second version but the only patch needed if you are applying to master.

Is this right?

Melanie Plageman

Attachment

0001-Add-test-to-validate-speculative-wait-is-performed-v2.patch

Re: Adding a test for speculative insert abort case

From

Alvaro Herrera

Date:

17 May 2019, 17:30:28

On 2019-May-17, Melanie Plageman wrote:

> I realized that the numbers at the front probably indicate which patch
> it is in a patch set and not the version, so, if that is the case, a
> renamed patch -- second version but the only patch needed if you are
> applying to master.  Is this right?

That's correct.  I suggest that "git format-patch -vN origin/master",
where the N is the version you're currently posting, generates good
patch files to attach in email.

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Adding a test for speculative insert abort case

From

Melanie Plageman

Date:

05 June 2019, 22:49:47

On Thu, May 16, 2019 at 8:46 PM Melanie Plageman <melanieplageman@gmail.com> wrote:

Good idea.
I squashed the changes I suggested in previous emails, Ashwin's patch, my
suggested updates to that patch, and the index order check all into one updated
patch attached.

I've updated this patch to make it apply on master cleanly. Thanks to
Alvaro for format-patch suggestion.

The first patch in the set adds the speculative wait case discussed
above from Ashwin's patch.

The second patch in the set is another suggestion I have. I noticed
that the insert-conflict-toast test mentions that it is "not
guaranteed to lead to a failed speculative insertion" and, since it
seems to be testing the speculative abort but with TOAST tables, I
thought it might work to kill that spec file and move that test case
into insert-conflict-specconflict so the test can utilize the existing
advisory locks being used for the other tests in that file to make it
deterministic which session succeeds in inserting the tuple.

Melanie Plageman

Attachment

Re: Adding a test for speculative insert abort case

From

Andres Freund

Date:

24 July 2019, 18:48:06

Hi,

On 2019-06-05 15:49:47 -0700, Melanie Plageman wrote:
> On Thu, May 16, 2019 at 8:46 PM Melanie Plageman <melanieplageman@gmail.com>
> wrote:
> 
> >
> > Good idea.
> > I squashed the changes I suggested in previous emails, Ashwin's patch, my
> > suggested updates to that patch, and the index order check all into one
> > updated
> > patch attached.
> >
> >
> I've updated this patch to make it apply on master cleanly. Thanks to
> Alvaro for format-patch suggestion.

Planning to push this, now that v12 is branched off. But only to master, I
don't think it's worth backpatching at the moment.


> The second patch in the set is another suggestion I have. I noticed
> that the insert-conflict-toast test mentions that it is "not
> guaranteed to lead to a failed speculative insertion" and, since it
> seems to be testing the speculative abort but with TOAST tables, I
> thought it might work to kill that spec file and move that test case
> into insert-conflict-specconflict so the test can utilize the existing
> advisory locks being used for the other tests in that file to make it
> deterministic which session succeeds in inserting the tuple.

Seems like a good plan.
> diff --git a/src/test/isolation/specs/insert-conflict-specconflict.spec
b/src/test/isolation/specs/insert-conflict-specconflict.spec
> index 3a70484fc2..7f29fb9d02 100644
> --- a/src/test/isolation/specs/insert-conflict-specconflict.spec
> +++ b/src/test/isolation/specs/insert-conflict-specconflict.spec
> @@ -10,7 +10,7 @@ setup
>  {
>       CREATE OR REPLACE FUNCTION blurt_and_lock(text) RETURNS text IMMUTABLE LANGUAGE plpgsql AS $$
>       BEGIN
> -        RAISE NOTICE 'called for %', $1;
> +        RAISE NOTICE 'blurt_and_lock() called for %', $1;
>  
>      -- depending on lock state, wait for lock 2 or 3
>          IF pg_try_advisory_xact_lock(current_setting('spec.session')::int, 1) THEN
> @@ -23,9 +23,16 @@ setup
>      RETURN $1;
>      END;$$;
>  
> +    CREATE OR REPLACE FUNCTION blurt_and_lock2(text) RETURNS text IMMUTABLE LANGUAGE plpgsql AS $$
> +    BEGIN
> +        RAISE NOTICE 'blurt_and_lock2() called for %', $1;
> +        PERFORM pg_advisory_xact_lock(current_setting('spec.session')::int, 4);
> +    RETURN $1;
> +    END;$$;
> +

Any chance for a bit more descriptive naming than *2? I can live with
it, but ...


> +step "controller_print_speculative_locks" { SELECT locktype,classid,objid,mode,granted FROM pg_locks WHERE
locktype='speculative
> +token' ORDER BY granted; }

I think showing the speculative locks is possibly going to be unreliable
- the release time of speculative locks is IIRC not that reliable. I
think it could e.g. happen that speculative locks are held longer
because autovacuum spawned an analyze in the background.


> +   # Should report s1 is waiting on speculative lock
> +   "controller_print_speculative_locks"

Hm, I might be missing something, but I don't think it currently
does. Looking at the expected file:

+step controller_print_speculative_locks: SELECT locktype,classid,objid,mode,granted FROM pg_locks WHERE
locktype='speculative
+token' ORDER BY granted;
+locktype       classid        objid          mode           granted        
+

And if it showed something, it'd make the test not work, because
classid/objid aren't necessarily going to be the same from test to test.

Greetings,

Andres Freund

Re: Adding a test for speculative insert abort case

From

Melanie Plageman

Date:

07 August 2019, 20:47:17

On Wed, Jul 24, 2019 at 11:48 AM Andres Freund <andres@anarazel.de> wrote:

> diff --git a/src/test/isolation/specs/insert-conflict-specconflict.spec b/src/test/isolation/specs/insert-conflict-specconflict.spec
> index 3a70484fc2..7f29fb9d02 100644
> --- a/src/test/isolation/specs/insert-conflict-specconflict.spec
> +++ b/src/test/isolation/specs/insert-conflict-specconflict.spec
> @@ -10,7 +10,7 @@ setup
> {
> CREATE OR REPLACE FUNCTION blurt_and_lock(text) RETURNS text IMMUTABLE LANGUAGE plpgsql AS $$
> BEGIN
> - RAISE NOTICE 'called for %', $1;
> + RAISE NOTICE 'blurt_and_lock() called for %', $1;
>
> -- depending on lock state, wait for lock 2 or 3
> IF pg_try_advisory_xact_lock(current_setting('spec.session')::int, 1) THEN
> @@ -23,9 +23,16 @@ setup
> RETURN $1;
> END;$$;
>
> + CREATE OR REPLACE FUNCTION blurt_and_lock2(text) RETURNS text IMMUTABLE LANGUAGE plpgsql AS $$
> + BEGIN
> + RAISE NOTICE 'blurt_and_lock2() called for %', $1;
> + PERFORM pg_advisory_xact_lock(current_setting('spec.session')::int, 4);
> + RETURN $1;
> + END;$$;
> +

Any chance for a bit more descriptive naming than *2? I can live with
it, but ...

Taylor Vesely and I paired on updating this test, and, it became clear
that the way that the steps and functions are named makes it very
difficult to understand what the test is doing. That is, I helped
write this test and, after a month away, I could no longer understand
what it was doing at all.

We changed the text of the blurts to "acquiring advisory lock ..."
from "blocking" because we realized that this would print even when
the lock was acquired immediately successfully, which is a little
misleading for the reader.

He's taking a stab at some renaming/refactoring to make it more clear
(including renaming blurt_and_lock2())

> +step "controller_print_speculative_locks" { SELECT locktype,classid,objid,mode,granted FROM pg_locks WHERE locktype='speculative
> +token' ORDER BY granted; }

I think showing the speculative locks is possibly going to be unreliable
- the release time of speculative locks is IIRC not that reliable. I
think it could e.g. happen that speculative locks are held longer
because autovacuum spawned an analyze in the background.

I actually think having the "controller_print_speculative_locks"
wouldn't be a problem because we have not released the advisory lock
on 4 in s2 that allows it to complete its speculative insertion and so
s1 will still be in speculative wait.

The step that might be a problem if autovacuum delays release of the
speculative locks is the "controller_show" step, because, at that
point, if the lock wasn't released, then s1 would still be waiting and
wouldn't have updated.

> + # Should report s1 is waiting on speculative lock
> + "controller_print_speculative_locks"

Hm, I might be missing something, but I don't think it currently
does. Looking at the expected file:

+step controller_print_speculative_locks: SELECT locktype,classid,objid,mode,granted FROM pg_locks WHERE locktype='speculative
+token' ORDER BY granted;
+locktype classid objid mode granted
+

Oops! due to an errant newline, the query wasn't correct.

And if it showed something, it'd make the test not work, because
classid/objid aren't necessarily going to be the same from test to test.

Good point. In the attached patch, classid/objid columns are removed
from the SELECT list.

Melanie & Taylor

Attachment

v3-0001-Test-additional-speculative-conflict-scenarios.patch

Re: Adding a test for speculative insert abort case

From

Melanie Plageman

Date:

21 August 2019, 20:59:00

On Wed, Aug 7, 2019 at 1:47 PM Melanie Plageman <melanieplageman@gmail.com> wrote:

On Wed, Jul 24, 2019 at 11:48 AM Andres Freund <andres@anarazel.de> wrote:
> diff --git a/src/test/isolation/specs/insert-conflict-specconflict.spec b/src/test/isolation/specs/insert-conflict-specconflict.spec
> index 3a70484fc2..7f29fb9d02 100644
> --- a/src/test/isolation/specs/insert-conflict-specconflict.spec
> +++ b/src/test/isolation/specs/insert-conflict-specconflict.spec
> @@ -10,7 +10,7 @@ setup
> {
> CREATE OR REPLACE FUNCTION blurt_and_lock(text) RETURNS text IMMUTABLE LANGUAGE plpgsql AS $$
> BEGIN
> - RAISE NOTICE 'called for %', $1;
> + RAISE NOTICE 'blurt_and_lock() called for %', $1;
>
> -- depending on lock state, wait for lock 2 or 3
> IF pg_try_advisory_xact_lock(current_setting('spec.session')::int, 1) THEN
> @@ -23,9 +23,16 @@ setup
> RETURN $1;
> END;$$;
>
> + CREATE OR REPLACE FUNCTION blurt_and_lock2(text) RETURNS text IMMUTABLE LANGUAGE plpgsql AS $$
> + BEGIN
> + RAISE NOTICE 'blurt_and_lock2() called for %', $1;
> + PERFORM pg_advisory_xact_lock(current_setting('spec.session')::int, 4);
> + RETURN $1;
> + END;$$;
> +

Any chance for a bit more descriptive naming than *2? I can live with
it, but ...

Taylor Vesely and I paired on updating this test, and, it became clear
that the way that the steps and functions are named makes it very
difficult to understand what the test is doing. That is, I helped
write this test and, after a month away, I could no longer understand
what it was doing at all.

We changed the text of the blurts to "acquiring advisory lock ..."
from "blocking" because we realized that this would print even when
the lock was acquired immediately successfully, which is a little
misleading for the reader.

He's taking a stab at some renaming/refactoring to make it more clear
(including renaming blurt_and_lock2())

So, Taylor and I had hoped to rename the steps to something more specific that
told the story of what this test is doing and made it more clear. Unfortunately,
our attempt to do that didn't work and made step re-use very difficult.
Alas, we decided the original names were less confusing.

My idea for renaming blurt_and_lock2() was actually to rename blurt_and_lock()
to blurt_and_lock_123() -- since it always takes a lock on 1,2, or 3.
Then, I could name the second function, which locks 4, blurt_and_lock_4().
What do you think?

I've attached a rebased patch updated with the new function names.

> +step "controller_print_speculative_locks" { SELECT locktype,classid,objid,mode,granted FROM pg_locks WHERE locktype='speculative
> +token' ORDER BY granted; }

I think showing the speculative locks is possibly going to be unreliable
- the release time of speculative locks is IIRC not that reliable. I
think it could e.g. happen that speculative locks are held longer
because autovacuum spawned an analyze in the background.

I actually think having the "controller_print_speculative_locks"
wouldn't be a problem because we have not released the advisory lock
on 4 in s2 that allows it to complete its speculative insertion and so
s1 will still be in speculative wait.

The step that might be a problem if autovacuum delays release of the
speculative locks is the "controller_show" step, because, at that
point, if the lock wasn't released, then s1 would still be waiting and
wouldn't have updated.

So, what should we do about this? Do you agree that the "controller_show" step
would be a problem?

Melanie Plageman

Attachment

v4-0001-Test-additional-speculative-conflict-scenarios.patch

Re: Adding a test for speculative insert abort case

From

Michael Paquier

Date:

11 November 2019, 08:41:10

On Wed, Aug 21, 2019 at 01:59:00PM -0700, Melanie Plageman wrote:
> So, what should we do about this? Do you agree that the "controller_show"
> step would be a problem?

Andres, it seems to me that this is waiting some input from you.
--
Michael

Attachment

signature.asc

Re: Adding a test for speculative insert abort case

From

Andres Freund

Date:

07 February 2020, 07:02:58

Hi,

On 2019-08-21 13:59:00 -0700, Melanie Plageman wrote:
> >> > +step "controller_print_speculative_locks" { SELECT
> >> locktype,classid,objid,mode,granted FROM pg_locks WHERE
> >> locktype='speculative
> >> > +token' ORDER BY granted; }
> >>
> >> I think showing the speculative locks is possibly going to be unreliable
> >> - the release time of speculative locks is IIRC not that reliable. I
> >> think it could e.g. happen that speculative locks are held longer
> >> because autovacuum spawned an analyze in the background.
> >>
> >>
> > I actually think having the "controller_print_speculative_locks"
> > wouldn't be a problem because we have not released the advisory lock
> > on 4 in s2 that allows it to complete its speculative insertion and so
> > s1 will still be in speculative wait.

Hm. At the very least it'd have to be restricted to only match locks in
the same database - e.g. for parallel installcheck it is common for
there to be other concurrent tests.  I'll add that when committing, no
need for a new version.

I'm also a bit concerned that adding the pg_locks query would mean we can't
run the test in parallel with others, if we ever finally get around to
adding a parallel isolationtester schedule (which is really needed, it's
too slow as is).
https://postgr.es/m/20180124231006.z7spaz5gkzbdvob5@alvherre.pgsql
But I guess we we'll just deal with not running this test in parallel.

> > The step that might be a problem if autovacuum delays release of the
> > speculative locks is the "controller_show" step, because, at that
> > point, if the lock wasn't released, then s1 would still be waiting and
> > wouldn't have updated.
> >
> So, what should we do about this? Do you agree that the "controller_show"
> step would be a problem?

It hasn't caused failures so far, I think. Or are you saying you think
it's more likely to cause failures in the added tests?

Had planned to commit now, but I'm not able to think through the state
transitions at this hour, apparently :(. I'll try to do it tomorrow
morning.

Greetings,

Andres Freund

Re: Adding a test for speculative insert abort case

From

Andres Freund

Date:

08 February 2020, 00:40:46

Hi,

I'm currently fighting with a race I'm observing in about 1/4 of the
runs. I get:
@@ -361,16 +361,18 @@
 locktype       mode           granted

 speculative tokenShareLock      f
 speculative tokenExclusiveLock  t
 step controller_unlock_2_4: SELECT pg_advisory_unlock(2, 4);
 pg_advisory_unlock

 t
 s1: NOTICE:  blurt_and_lock_123() called for k1 in session 1
 s1: NOTICE:  acquiring advisory lock on 2
+s1: NOTICE:  blurt_and_lock_123() called for k1 in session 1
+s1: NOTICE:  acquiring advisory lock on 2
 step s1_upsert: <... completed>
 step s2_upsert: <... completed>
 step controller_show: SELECT * FROM upserttest;
 key            data

 k1             inserted s2 with conflict update s1
(this is the last permutation)

The issue is basically that s1 goes through the
check_exclusion_or_unique_constraint() conflict check twice.

I added a bit of debugging information (using fprintf, with elog it was
much harder to hit):

Success:
2020-02-07 16:14:56.501 PST [1167003][5/1254:8465] CONTEXT:  PL/pgSQL function blurt_and_lock_123(text) line 7 at
RAISE
1167003: acquiring xact lock
2020-02-07 16:14:56.512 PST [1167001][3/0:0] DEBUG:  bind <unnamed> to isolationtester_waiting
2020-02-07 16:14:56.522 PST [1167001][3/0:0] DEBUG:  bind <unnamed> to isolationtester_waiting
1167002: releasing xact lock 2 3
1167004: acquired xact lock
1167004: xid 8466 acquiring 5
2020-02-07 16:14:56.523 PST [1167004][6/1014:8466] LOG:  INSERT @ 0/9CC70F0:  - Heap/INSERT: off 2 flags 0x0C
2020-02-07 16:14:56.523 PST [1167004][6/1014:8466] NOTICE:  blurt_and_lock_123() called for k1 in session 2
2020-02-07 16:14:56.523 PST [1167004][6/1014:8466] CONTEXT:  PL/pgSQL function blurt_and_lock_123(text) line 3 at
RAISE
2020-02-07 16:14:56.523 PST [1167004][6/1014:8466] NOTICE:  acquiring advisory lock on 2
2020-02-07 16:14:56.523 PST [1167004][6/1014:8466] CONTEXT:  PL/pgSQL function blurt_and_lock_123(text) line 7 at
RAISE
1167004: acquiring xact lock
2020-02-07 16:14:56.533 PST [1167001][3/0:0] DEBUG:  bind <unnamed> to isolationtester_waiting
2020-02-07 16:14:56.544 PST [1167001][3/0:0] DEBUG:  bind <unnamed> to isolationtester_waiting
2020-02-07 16:14:56.555 PST [1167001][3/0:0] DEBUG:  bind <unnamed> to isolationtester_waiting
2020-02-07 16:14:56.565 PST [1167001][3/0:0] DEBUG:  bind <unnamed> to isolationtester_waiting
2020-02-07 16:14:56.576 PST [1167001][3/0:0] DEBUG:  bind <unnamed> to isolationtester_waiting
2020-02-07 16:14:56.587 PST [1167001][3/0:0] DEBUG:  bind <unnamed> to isolationtester_waiting
1167002: releasing xact lock 2 2
1167004: acquired xact lock
2020-02-07 16:14:56.588 PST [1167004][6/1014:8466] LOG:  INSERT @ 0/9CC7150:  - Btree/NEWROOT: lev 0
2020-02-07 16:14:56.588 PST [1167004][6/1014:8466] LOG:  INSERT @ 0/9CC7190:  - Btree/INSERT_LEAF: off 1
2020-02-07 16:14:56.588 PST [1167004][6/1014:8466] NOTICE:  blurt_and_lock_4() called for k1 in session 2
2020-02-07 16:14:56.588 PST [1167004][6/1014:8466] CONTEXT:  PL/pgSQL function blurt_and_lock_4(text) line 3 at RAISE
2020-02-07 16:14:56.588 PST [1167004][6/1014:8466] NOTICE:  acquiring advisory lock on 4
2020-02-07 16:14:56.588 PST [1167004][6/1014:8466] CONTEXT:  PL/pgSQL function blurt_and_lock_4(text) line 4 at RAISE
1167004: acquiring xact lock
2020-02-07 16:14:56.598 PST [1167001][3/0:0] DEBUG:  bind <unnamed> to isolationtester_waiting
2020-02-07 16:14:56.609 PST [1167001][3/0:0] DEBUG:  bind <unnamed> to isolationtester_waiting
2020-02-07 16:14:56.620 PST [1167001][3/0:0] DEBUG:  bind <unnamed> to isolationtester_waiting
2020-02-07 16:14:56.630 PST [1167001][3/0:0] DEBUG:  bind <unnamed> to isolationtester_waiting
1167002: releasing xact lock 1 2
1167003: acquired xact lock
2020-02-07 16:14:56.631 PST [1167003][5/1254:8465] LOG:  INSERT @ 0/9CC71D0:  - Btree/INSERT_LEAF: off 1
2020-02-07 16:14:56.631 PST [1167003][5/1254:8465] NOTICE:  blurt_and_lock_4() called for k1 in session 1
2020-02-07 16:14:56.631 PST [1167003][5/1254:8465] CONTEXT:  PL/pgSQL function blurt_and_lock_4(text) line 3 at RAISE
2020-02-07 16:14:56.631 PST [1167003][5/1254:8465] NOTICE:  acquiring advisory lock on 4
2020-02-07 16:14:56.631 PST [1167003][5/1254:8465] CONTEXT:  PL/pgSQL function blurt_and_lock_4(text) line 4 at RAISE
1167003: acquiring xact lock
1167003: acquired xact lock
2020-02-07 16:14:56.632 PST [1167003][5/1254:8465] LOG:  INSERT @ 0/9CC7230:  - Btree/NEWROOT: lev 0
2020-02-07 16:14:56.632 PST [1167003][5/1254:8465] LOG:  INSERT @ 0/9CC7270:  - Btree/INSERT_LEAF: off 1
2020-02-07 16:14:56.632 PST [1167003][5/1254:8465] LOG:  INSERT @ 0/9CC72A8:  - Heap/DELETE: off 1 flags 0x08
1167003: xid 8465 releasing lock 5
1167003: retry due to conflict
2020-02-07 16:14:56.632 PST [1167003][5/1254:8465] NOTICE:  blurt_and_lock_123() called for k1 in session 1
2020-02-07 16:14:56.632 PST [1167003][5/1254:8465] CONTEXT:  PL/pgSQL function blurt_and_lock_123(text) line 3 at
RAISE
2020-02-07 16:14:56.632 PST [1167003][5/1254:8465] NOTICE:  acquiring advisory lock on 2
2020-02-07 16:14:56.632 PST [1167003][5/1254:8465] CONTEXT:  PL/pgSQL function blurt_and_lock_123(text) line 7 at
RAISE
1167003: acquiring xact lock
1167003: acquired xact lock
2020-02-07 16:14:56.632 PST [1167003][5/1254:8465] NOTICE:  blurt_and_lock_123() called for k1 in session 1
2020-02-07 16:14:56.632 PST [1167003][5/1254:8465] CONTEXT:  PL/pgSQL function blurt_and_lock_123(text) line 3 at
RAISE
2020-02-07 16:14:56.632 PST [1167003][5/1254:8465] NOTICE:  acquiring advisory lock on 2
2020-02-07 16:14:56.632 PST [1167003][5/1254:8465] CONTEXT:  PL/pgSQL function blurt_and_lock_123(text) line 7 at
RAISE
1167003: acquiring xact lock
1167003: acquired xact lock
1167003: xid 8465 waiting xwait 8466 (xmin 8466 xmax 0) spec 5
2020-02-07 16:14:56.642 PST [1167001][3/0:0] DEBUG:  bind <unnamed> to isolationtester_waiting
2020-02-07 16:14:56.653 PST [1167001][3/0:0] DEBUG:  bind <unnamed> to isolationtester_waiting
2020-02-07 16:14:56.667 PST [1167001][3/0:0] DEBUG:  bind <unnamed> to isolationtester_waiting
2020-02-07 16:14:56.677 PST [1167001][3/0:0] DEBUG:  bind <unnamed> to isolationtester_waiting
1167002: releasing xact lock 2 4
1167004: acquired xact lock
2020-02-07 16:14:56.678 PST [1167004][6/1014:8466] LOG:  INSERT @ 0/9CC72E8:  - Btree/INSERT_LEAF: off 2
2020-02-07 16:14:56.678 PST [1167004][6/1014:8466] LOG:  INSERT @ 0/9CC7318:  - Heap/HEAP_CONFIRM: off 2
1167004: xid 8466 releasing lock 5
2020-02-07 16:14:56.678 PST [1167004][6/1014:8466] LOG:  INSERT @ 0/9CC7348:  - Transaction/COMMIT: 2020-02-07
16:14:56.678602-08
2020-02-07 16:14:56.678 PST [1167004][6/1014:8466] LOG:  xlog flush request 0/9CC7348; write 0/9CBDF58; flush
0/9CBDF58
2020-02-07 16:14:56.678 PST [1167004][6/1014:8466] STATEMENT:  INSERT INTO upserttest(key, data) VALUES('k1', 'inserted
s2')ON CONFLICT (blurt_and_lock_123(key)) DO UPDATE SET data = upserttest.data || ' with conflict update s2';
 
2020-02-07 16:14:56.678 PST [1167003][5/1254:8465] NOTICE:  blurt_and_lock_123() called for k1 in session 1
2020-02-07 16:14:56.678 PST [1167003][5/1254:8465] CONTEXT:  PL/pgSQL function blurt_and_lock_123(text) line 3 at
RAISE
2020-02-07 16:14:56.678 PST [1167003][5/1254:8465] NOTICE:  acquiring advisory lock on 2
2020-02-07 16:14:56.678 PST [1167003][5/1254:8465] CONTEXT:  PL/pgSQL function blurt_and_lock_123(text) line 7 at
RAISE
1167003: acquiring xact lock
1167003: acquired xact lock
2020-02-07 16:14:56.678 PST [1167003][5/1254:8465] LOG:  INSERT @ 0/9CC7380:  - Heap/LOCK: off 2: xid 8465: flags 0x00
LOCK_ONLYEXCL_LOCK
 
2020-02-07 16:14:56.679 PST [1167003][5/1254:8465] LOG:  INSERT @ 0/9CC73F0:  - Heap/HOT_UPDATE: off 2 xmax 8465 flags
0x10; new off 3 xmax 8465
 
2020-02-07 16:14:56.679 PST [1167003][5/1254:8465] LOG:  INSERT @ 0/9CC7420:  - Transaction/COMMIT: 2020-02-07
16:14:56.679085-08
2020-02-07 16:14:56.679 PST [1167003][5/1254:8465] LOG:  xlog flush request 0/9CC7420; write 0/9CC7060; flush
0/9CC7060
2020-02-07 16:14:56.679 PST [1167003][5/1254:8465] STATEMENT:  INSERT INTO upserttest(key, data) VALUES('k1', 'inserted
s1')ON CONFLICT (blurt_and_lock_123(key)) DO UPDATE SET data = upserttest.data || ' with conflict update s1'; 

fail:
1167056: releasing xact lock 2 3
1167058: acquired xact lock
1167058: xid 8490 acquiring 5
2020-02-07 16:16:43.990 PST [1167058][6/11:8490] LOG:  INSERT @ 0/9D2D1D8:  - Heap/INSERT: off 2 flags 0x0C
2020-02-07 16:16:43.990 PST [1167058][6/11:8490] NOTICE:  blurt_and_lock_123() called for k1 in session 2
2020-02-07 16:16:43.990 PST [1167058][6/11:8490] CONTEXT:  PL/pgSQL function blurt_and_lock_123(text) line 3 at RAISE
2020-02-07 16:16:43.990 PST [1167058][6/11:8490] NOTICE:  acquiring advisory lock on 2
2020-02-07 16:16:43.990 PST [1167058][6/11:8490] CONTEXT:  PL/pgSQL function blurt_and_lock_123(text) line 7 at RAISE
1167058: acquiring xact lock
2020-02-07 16:16:44.000 PST [1167055][3/0:0] DEBUG:  bind <unnamed> to isolationtester_waiting
2020-02-07 16:16:44.011 PST [1167055][3/0:0] DEBUG:  bind <unnamed> to isolationtester_waiting
2020-02-07 16:16:44.022 PST [1167055][3/0:0] DEBUG:  bind <unnamed> to isolationtester_waiting
2020-02-07 16:16:44.033 PST [1167055][3/0:0] DEBUG:  bind <unnamed> to isolationtester_waiting
2020-02-07 16:16:44.044 PST [1167055][3/0:0] DEBUG:  bind <unnamed> to isolationtester_waiting
2020-02-07 16:16:44.054 PST [1167055][3/0:0] DEBUG:  bind <unnamed> to isolationtester_waiting
1167056: releasing xact lock 2 2
1167058: acquired xact lock
2020-02-07 16:16:44.055 PST [1167058][6/11:8490] LOG:  INSERT @ 0/9D2D238:  - Btree/NEWROOT: lev 0
2020-02-07 16:16:44.055 PST [1167058][6/11:8490] LOG:  INSERT @ 0/9D2D278:  - Btree/INSERT_LEAF: off 1
2020-02-07 16:16:44.056 PST [1167058][6/11:8490] NOTICE:  blurt_and_lock_4() called for k1 in session 2
2020-02-07 16:16:44.056 PST [1167058][6/11:8490] CONTEXT:  PL/pgSQL function blurt_and_lock_4(text) line 3 at RAISE
2020-02-07 16:16:44.056 PST [1167058][6/11:8490] NOTICE:  acquiring advisory lock on 4
2020-02-07 16:16:44.056 PST [1167058][6/11:8490] CONTEXT:  PL/pgSQL function blurt_and_lock_4(text) line 4 at RAISE
1167058: acquiring xact lock
2020-02-07 16:16:44.066 PST [1167055][3/0:0] DEBUG:  bind <unnamed> to isolationtester_waiting
2020-02-07 16:16:44.076 PST [1167055][3/0:0] DEBUG:  bind <unnamed> to isolationtester_waiting
2020-02-07 16:16:44.087 PST [1167055][3/0:0] DEBUG:  bind <unnamed> to isolationtester_waiting
2020-02-07 16:16:44.098 PST [1167055][3/0:0] DEBUG:  bind <unnamed> to isolationtester_waiting
1167056: releasing xact lock 1 2
1167057: acquired xact lock
2020-02-07 16:16:44.099 PST [1167057][5/13:8489] LOG:  INSERT @ 0/9D2D2B8:  - Btree/INSERT_LEAF: off 1
2020-02-07 16:16:44.099 PST [1167057][5/13:8489] NOTICE:  blurt_and_lock_4() called for k1 in session 1
2020-02-07 16:16:44.099 PST [1167057][5/13:8489] CONTEXT:  PL/pgSQL function blurt_and_lock_4(text) line 3 at RAISE
2020-02-07 16:16:44.099 PST [1167057][5/13:8489] NOTICE:  acquiring advisory lock on 4
2020-02-07 16:16:44.099 PST [1167057][5/13:8489] CONTEXT:  PL/pgSQL function blurt_and_lock_4(text) line 4 at RAISE
1167057: acquiring xact lock
1167057: acquired xact lock
2020-02-07 16:16:44.100 PST [1167057][5/13:8489] LOG:  INSERT @ 0/9D2D318:  - Btree/NEWROOT: lev 0
2020-02-07 16:16:44.100 PST [1167057][5/13:8489] LOG:  INSERT @ 0/9D2D358:  - Btree/INSERT_LEAF: off 1
2020-02-07 16:16:44.100 PST [1167057][5/13:8489] LOG:  INSERT @ 0/9D2D390:  - Heap/DELETE: off 1 flags 0x08
1167057: xid 8489 releasing lock 5
1167057: retry due to conflict
2020-02-07 16:16:44.100 PST [1167057][5/13:8489] NOTICE:  blurt_and_lock_123() called for k1 in session 1
2020-02-07 16:16:44.100 PST [1167057][5/13:8489] CONTEXT:  PL/pgSQL function blurt_and_lock_123(text) line 3 at RAISE
2020-02-07 16:16:44.100 PST [1167057][5/13:8489] NOTICE:  acquiring advisory lock on 2
2020-02-07 16:16:44.100 PST [1167057][5/13:8489] CONTEXT:  PL/pgSQL function blurt_and_lock_123(text) line 7 at RAISE
1167057: acquiring xact lock
1167057: acquired xact lock
2020-02-07 16:16:44.100 PST [1167057][5/13:8489] NOTICE:  blurt_and_lock_123() called for k1 in session 1
2020-02-07 16:16:44.100 PST [1167057][5/13:8489] CONTEXT:  PL/pgSQL function blurt_and_lock_123(text) line 3 at RAISE
2020-02-07 16:16:44.100 PST [1167057][5/13:8489] NOTICE:  acquiring advisory lock on 2
2020-02-07 16:16:44.100 PST [1167057][5/13:8489] CONTEXT:  PL/pgSQL function blurt_and_lock_123(text) line 7 at RAISE
1167057: acquiring xact lock
1167057: acquired xact lock
1167057: xid 8489 waiting xwait 8490 (xmin 8490 xmax 0) spec 5
2020-02-07 16:16:44.110 PST [1167055][3/0:0] DEBUG:  bind <unnamed> to isolationtester_waiting
2020-02-07 16:16:44.121 PST [1167055][3/0:0] DEBUG:  bind <unnamed> to isolationtester_waiting
2020-02-07 16:16:44.135 PST [1167055][3/0:0] DEBUG:  bind <unnamed> to isolationtester_waiting
2020-02-07 16:16:44.145 PST [1167055][3/0:0] DEBUG:  bind <unnamed> to isolationtester_waiting
1167056: releasing xact lock 2 4
1167058: acquired xact lock
2020-02-07 16:16:44.146 PST [1167058][6/11:8490] LOG:  INSERT @ 0/9D2D3D0:  - Btree/INSERT_LEAF: off 2
2020-02-07 16:16:44.146 PST [1167058][6/11:8490] LOG:  INSERT @ 0/9D2D400:  - Heap/HEAP_CONFIRM: off 2
1167058: xid 8490 releasing lock 5
2020-02-07 16:16:44.146 PST [1167058][6/11:8490] LOG:  INSERT @ 0/9D2D430:  - Transaction/COMMIT: 2020-02-07
16:16:44.146767-08
2020-02-07 16:16:44.146 PST [1167058][6/11:8490] LOG:  xlog flush request 0/9D2D430; write 0/9D24058; flush 0/9D24058
2020-02-07 16:16:44.146 PST [1167058][6/11:8490] STATEMENT:  INSERT INTO upserttest(key, data) VALUES('k1', 'inserted
s2')ON CONFLICT (blurt_and_lock_123(key)) DO UPDATE SET data = upserttest.data || ' with conflict update s2';
 
2020-02-07 16:16:44.146 PST [1167057][5/13:8489] NOTICE:  blurt_and_lock_123() called for k1 in session 1
2020-02-07 16:16:44.146 PST [1167057][5/13:8489] CONTEXT:  PL/pgSQL function blurt_and_lock_123(text) line 3 at RAISE
2020-02-07 16:16:44.146 PST [1167057][5/13:8489] NOTICE:  acquiring advisory lock on 2
2020-02-07 16:16:44.146 PST [1167057][5/13:8489] CONTEXT:  PL/pgSQL function blurt_and_lock_123(text) line 7 at RAISE
1167057: acquiring xact lock
1167057: acquired xact lock
1167057: xid 8489 waiting xwait 8490 (xmin 8490 xmax 0) spec 0
2020-02-07 16:16:44.147 PST [1167057][5/13:8489] NOTICE:  blurt_and_lock_123() called for k1 in session 1
2020-02-07 16:16:44.147 PST [1167057][5/13:8489] CONTEXT:  PL/pgSQL function blurt_and_lock_123(text) line 3 at RAISE
2020-02-07 16:16:44.147 PST [1167057][5/13:8489] NOTICE:  acquiring advisory lock on 2
2020-02-07 16:16:44.147 PST [1167057][5/13:8489] CONTEXT:  PL/pgSQL function blurt_and_lock_123(text) line 7 at RAISE
1167057: acquiring xact lock
1167057: acquired xact lock
2020-02-07 16:16:44.147 PST [1167057][5/13:8489] LOG:  INSERT @ 0/9D2D468:  - Heap/LOCK: off 2: xid 8489: flags 0x00
LOCK_ONLYEXCL_LOCK
 
2020-02-07 16:16:44.147 PST [1167057][5/13:8489] LOG:  INSERT @ 0/9D2D4D8:  - Heap/HOT_UPDATE: off 2 xmax 8489 flags
0x10; new off 3 xmax 8489
 
2020-02-07 16:16:44.147 PST [1167057][5/13:8489] LOG:  INSERT @ 0/9D2D508:  - Transaction/COMMIT: 2020-02-07
16:16:44.147348-08
2020-02-07 16:16:44.147 PST [1167057][5/13:8489] LOG:  xlog flush request 0/9D2D508; write 0/9D2D148; flush 0/9D2D148
2020-02-07 16:16:44.147 PST [1167057][5/13:8489] STATEMENT:  INSERT INTO upserttest(key, data) VALUES('k1', 'inserted
s1')ON CONFLICT (blurt_and_lock_123(key)) DO UPDATE SET data = upserttest.data || ' with conflict update s1';
 


The important bit is here the different "xid .* waiting xwait .* spec*"
lines. In the success case we see:

1167003: xid 8465 waiting xwait 8466 (xmin 8466 xmax 0) spec 5
1167002: releasing xact lock 2 4
1167004: acquired xact lock
1167004: xid 8466 releasing lock 5
2020-02-07 16:14:56.678 PST [1167004][6/1014:8466] LOG:  INSERT @ 0/9CC7348:  - Transaction/COMMIT: 2020-02-07
16:14:56.678602-08
1167003: acquired xact lock

In the failing case we see:
1167057: xid 8489 waiting xwait 8490 (xmin 8490 xmax 0) spec 5
1167056: releasing xact lock 2 4
1167058: acquired xact lock
1167058: xid 8490 releasing lock 5
2020-02-07 16:16:44.146 PST [1167058][6/11:8490] LOG:  INSERT @ 0/9D2D430:  - Transaction/COMMIT: 2020-02-07
16:16:44.146767-08
1167057: xid 8489 waiting xwait 8490 (xmin 8490 xmax 0) spec 0
1167057: acquired xact lock


I think the issue here is that determines whether s1 can finish its
check_exclusion_or_unique_constraint() check with one retry is whether
it reaches it does the tuple visibility test before s2's transaction has
actually marked itself as visible (note that ProcArrayEndTransaction is
after RecordTransactionCommit logging the COMMIT above).

I think the fix is quite easy: Ensure that there *always* will be the
second wait iteration on the transaction (in addition to the already
always existing wait on the speculative token). Which is just adding
s2_begin s2_commit steps.   Simple, but took me a few hours to
understand :/.

I've attached that portion of my changes. Will interrupt scheduled
programming for a bit of exercise now.

Greetings,

Andres Freund

Attachment

fix.diff

Re: Adding a test for speculative insert abort case

From

Andres Freund

Date:

12 February 2020, 00:45:32

Hi,

On 2020-02-07 16:40:46 -0800, Andres Freund wrote:
> I'm currently fighting with a race I'm observing in about 1/4 of the
> runs. [...]
> I think the issue here is that determines whether s1 can finish its
> check_exclusion_or_unique_constraint() check with one retry is whether
> it reaches it does the tuple visibility test before s2's transaction has
> actually marked itself as visible (note that ProcArrayEndTransaction is
> after RecordTransactionCommit logging the COMMIT above).
> 
> I think the fix is quite easy: Ensure that there *always* will be the
> second wait iteration on the transaction (in addition to the already
> always existing wait on the speculative token). Which is just adding
> s2_begin s2_commit steps.   Simple, but took me a few hours to
> understand :/.
> 
> I've attached that portion of my changes. Will interrupt scheduled
> programming for a bit of exercise now.

I've pushed this now. Thanks for the patch, and the review!

I additionally restricted the controller_print_speculative_locks step to
the current database and made a bunch of debug output more
precise. Survived ~150 runs locally.

Lets see what the buildfarm says...

Regards,

Andres

Re: Adding a test for speculative insert abort case

From

Melanie Plageman

Date:

12 February 2020, 01:20:05

On Tue, Feb 11, 2020 at 4:45 PM Andres Freund <andres@anarazel.de> wrote:

I additionally restricted the controller_print_speculative_locks step to
the current database and made a bunch of debug output more
precise. Survived ~150 runs locally.

Lets see what the buildfarm says...

Thanks so much for finishing the patch and checking for race
conditions!

Melanie Plageman

Re: Adding a test for speculative insert abort case

From

Amit Kapila

Date:

24 March 2020, 12:33:57

On Wed, Feb 12, 2020 at 6:50 AM Melanie Plageman
<melanieplageman@gmail.com> wrote:
>
>
> On Tue, Feb 11, 2020 at 4:45 PM Andres Freund <andres@anarazel.de> wrote:
>>
>>
>> I additionally restricted the controller_print_speculative_locks step to
>> the current database and made a bunch of debug output more
>> precise. Survived ~150 runs locally.
>>
>> Lets see what the buildfarm says...
>>
>
> Thanks so much for finishing the patch and checking for race
> conditions!
>

Can we change the status of CF entry for this patch [1] to committed
or is there any work pending?


[1] - https://commitfest.postgresql.org/27/2200/
-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: Adding a test for speculative insert abort case

From

Andres Freund

Date:

24 March 2020, 19:13:46

On 2020-03-24 18:03:57 +0530, Amit Kapila wrote:
> Can we change the status of CF entry for this patch [1] to committed
> or is there any work pending?

Done!