Re: BUG #19355: Attempt to insert data unexpectedly during concurrent update - Mailing list pgsql-bugs

From Bernice Southey
Subject Re: BUG #19355: Attempt to insert data unexpectedly during concurrent update
Date
Msg-id CAEDh4nw=h5NihhuBgq4GFCfPP9vcg54QJ+K0XsG9S0owWDUhfA@mail.gmail.com
Whole thread Raw
In response to Re: BUG #19355: Attempt to insert data unexpectedly during concurrent update  (Tender Wang <tndrwang@gmail.com>)
List pgsql-bugs
Hi,

I've spent some time looking at the code and history of this bug
because I also ran into it [1] and I was deeply curious why concurrent
updates worked with a hash join + seq scan but not a nested loop +
index scan.

I was also curious if this bug is why Yuri didn't see deadlocks in
this related bug [2].

FWIW, I found a simpler way to reproduce this:

create table t(id int primary key, count int);
insert into t values (1, 0), (2, 0);

--session 1
begin;
update t set count = count + 1;

--session 2
set enable_seqscan = off;
update t set count = count + 1 from (values (1), (2)) v(id) where t.id = v.id;

Only one update succeeds in session 2.

Tender Wang wrote:
> Dean Rasheed wrote:
>> I'm somewhat conflicted as to which approach is better. I think maybe
>> there is less chance of other unintended side-effects if the set of
>> RTEs included in PlannerGlobal.allRelids, unprunableRelids, and
>> es_unpruned_relids is not changed. However, as it stands,
>> PlannerGlobal.allRelids is misnamed (it should probably have been
>> called "relationRelids", in line with the "relationOids" field).
>> Making it actually include all RTEs would solve that.

> Per the comment, I guess Amit didn't plan to include all RTEs in his design.

While digging, I noticed that allRelids seems only to be used to
create unprunableRelids. But unprunableRelids is used among other
things, to create es_unpruned_relids and as an unpruned_relids
parameter to ExecInitRangeTable. It's difficult to follow all the
paths where the missing relids will matter. So I think Tender's
approach is safer if amended to include all relids. I'm very new to
postgres source code, so this is a very tentative opinion. Amit can
best answer after his holiday.

I was curious enough to test in the meantime. I made an experimental
patch that just adds all rels to allRelids. This fixed all the issues
I'm aware of. It caused overexplain tests to fail by adding in new
non-relation relIds as expected. Apologies if I'm getting the
terminology wrong.

I ran Yuri's reproducer on my patched branch and indeed got several deadlocks.
psql:crash.sql:3: ERROR:  deadlock detected
DETAIL:  Process 71673 waits for ShareLock on transaction 2766;
blocked by process 71643.
Process 71643 waits for ShareLock on transaction 2777; blocked by process 71673.

Curiously, in the test I added (based on Dean's), enable_indexscan or
enable_seqscan didn't make any difference to the plan. But when I run
the reproducer manually, it does change the plan. Since the bug
disappears with a hash join + seq scan, the plan is critical. So I
included 'set enable_seqscan to 0' for reference and future proofing.

I didn't attempt to fix the comment because I don't understand it. I
also didn't check all my tabs vs spaces, or run TAP tests, as this
patch is just meant for info. Most of the code is still way over my
head, so my apologies if none of this is helpful.

As to why this works for a hash join but not a nested loop, I have
some ideas, but I'm still digging.

Happy New Year!
Bernice

[1]
https://www.postgresql.org/message-id/flat/CAMbWs4-uC8DWRZvByHej%2B9E5em7V_%2BzNyRfFx-UUpR4HwifK4w%40mail.gmail.com
[2] https://www.postgresql.org/message-id/ly6rss443iajiv5cfypzu6fgrmxdbwzdtuujkddr4eis7s2gcq%40gcdytigety6e

Attachment

pgsql-bugs by date:

Previous
From: Tom Lane
Date:
Subject: Re: BUG #19368: f54af9f does not create the correct macro for autotools build
Next
From: Masahiko Sawada
Date:
Subject: Re: BUG #19368: f54af9f does not create the correct macro for autotools build