Thread: BUG #18103: bugs of concurrent merge into when use different join plan

BUG #18103: bugs of concurrent merge into when use different join plan

From

PG Bug reporting form

Date:

10 September 2023, 15:57:14

The following bug has been logged on the website:

Bug reference:      18103
Logged by:          luajk
Email address:      luajk@qq.com
PostgreSQL version: 16rc1
Operating system:   ANY
Description:

Use Case:

drop table t1;
drop table t2;

create table t1(a int not null, b int);
create table t2(a int not null, b int);

insert into t1 values(generate_series(1,3), generate_series(1,3));
insert into t2 values(generate_series(1,3), generate_series(1,3));

execute in two concurrent sessions and do as following:

session1:
begin;
session2:
begin;
set enable_hashjoin = off;
set enable_mergejoin = off;
merge into t1 p using (select distinct a,b from t2) q on p.a = q.a when
matched then update set b = q.b when not matched then insert values(q.a,
q.b);
session1:
set enable_hashjoin = off;
set enable_mergejoin = off;
merge into t1 p using (select distinct a,b from t2) q on p.a = q.a when
matched then update set b = q.b when not matched then insert values(q.a,
q.b);
session2:
commit;
session1:
select count(*) from t1;  -- there are 5 rows;

however:

session1:
begin;
session2:
begin;
set enable_nestloop = off;
set enable_mergejoin = off;
merge into t1 p using (select distinct a,b from t2) q on p.a = q.a when
matched then update set b = q.b when not matched then insert values(q.a,
q.b);
session1:
set enable_nestloop = off;
set enable_mergejoin = off;
merge into t1 p using (select distinct a,b from t2) q on p.a = q.a when
matched then update set b = q.b when not matched then insert values(q.a,
q.b);
session2:
commit;
session1:
select count(*) from t1;  -- there are 3 rows;

nest loop join is the same as merge join in such scenes.

Re: BUG #18103: bugs of concurrent merge into when use different join plan

From

David Rowley

Date:

14 September 2023, 10:29:40

On Mon, 11 Sept 2023 at 01:03, PG Bug reporting form
<noreply@postgresql.org> wrote:
> session1:
> begin;
> session2:
> begin;
> set enable_hashjoin = off;
> set enable_mergejoin = off;
> merge into t1 p using (select distinct a,b from t2) q on p.a = q.a when
> matched then update set b = q.b when not matched then insert values(q.a,
> q.b);
> session1:
> set enable_hashjoin = off;
> set enable_mergejoin = off;
> merge into t1 p using (select distinct a,b from t2) q on p.a = q.a when
> matched then update set b = q.b when not matched then insert values(q.a,
> q.b);
> session2:
> commit;
> session1:
> select count(*) from t1;  -- there are 5 rows;

I agree this is a bug.

What seems to be going on is that in ExecMergeMatched() we hit the
TM_Updated case and when we try to fill the epqslot calling
EvalPlanQual() the nested loop does not seem to scan to find the
correct set of rows.  It works ok for the 1=1 row (which is why we get
5 rows instead of 6), but on the 2=2 row, I see it finds rows 3=1 and
returns that. The join type is LEFT, after all, so that's the EPQ row.
Back in ExecMergeMatched(), since the DISTINCT subquery is the outer
side of the join and t1 the inner side, the following code finds a
NULL ctid:

/*
* If we got no tuple, or the tuple we get has a
* NULL ctid, go back to caller: this one is not a
* MATCHED tuple anymore, so they can retry with
* NOT MATCHED actions.
*/
if (TupIsNull(epqslot))
    return false;

(void) ExecGetJunkAttribute(epqslot, resultRelInfo->ri_RowIdAttNo, &isNull);
if (isNull)
    return false;

Since the inner side of the (left) join is NULL, the ri_RowIdAttNo
attr is NULL and we return false in the final condition.

And that effectively means we run the NOT MATCHED code and do the INSERT.

I'm not really sure how EvalPlanQualNext() is meant to correctly find
the 2=2 row given that we end up doing ExecReScan() on the nested loop
and don't seem to find the row we intend to.

David

Re: BUG #18103: bugs of concurrent merge into when use different join plan

From

Richard Guo

Date:

18 September 2023, 09:44:07

On Thu, Sep 14, 2023 at 3:30 PM David Rowley <dgrowleyml@gmail.com> wrote:

I agree this is a bug.

What seems to be going on is that in ExecMergeMatched() we hit the
TM_Updated case and when we try to fill the epqslot calling
EvalPlanQual() the nested loop does not seem to scan to find the
correct set of rows. It works ok for the 1=1 row (which is why we get
5 rows instead of 6), but on the 2=2 row, I see it finds rows 3=1 and
returns that. The join type is LEFT, after all, so that's the EPQ row.
Back in ExecMergeMatched(), since the DISTINCT subquery is the outer
side of the join and t1 the inner side, the following code finds a
NULL ctid:

/*
* If we got no tuple, or the tuple we get has a
* NULL ctid, go back to caller: this one is not a
* MATCHED tuple anymore, so they can retry with
* NOT MATCHED actions.
*/
if (TupIsNull(epqslot))
return false;

(void) ExecGetJunkAttribute(epqslot, resultRelInfo->ri_RowIdAttNo, &isNull);
if (isNull)
return false;

Since the inner side of the (left) join is NULL, the ri_RowIdAttNo
attr is NULL and we return false in the final condition.

And that effectively means we run the NOT MATCHED code and do the INSERT.

I have the same observation. It seems that the EvalPlanQual recheck
does not work well with LEFT joins if there is concurrent update in a
WHEN MATCHED case. FWIW, if the optimizer chooses RIGHT join at last
for the MERGE command, we'd get the expected results.

Thanks
Richard

Re: BUG #18103: bugs of concurrent merge into when use different join plan

From

Dean Rasheed

Date:

21 September 2023, 14:49:46

On Mon, 18 Sept 2023 at 09:51, Richard Guo <guofenglinux@gmail.com> wrote:
>
> On Thu, Sep 14, 2023 at 3:30 PM David Rowley <dgrowleyml@gmail.com> wrote:
>>
>> I agree this is a bug.
>>
>> What seems to be going on is that in ExecMergeMatched() we hit the
>> TM_Updated case and when we try to fill the epqslot calling
>> EvalPlanQual() the nested loop does not seem to scan to find the
>> correct set of rows.
>
> I have the same observation.  It seems that the EvalPlanQual recheck
> does not work well with LEFT joins if there is concurrent update in a
> WHEN MATCHED case.  FWIW, if the optimizer chooses RIGHT join at last
> for the MERGE command, we'd get the expected results.
>

I don't feel that I have a good understanding of the EPQ mechanism,
but src/backend/executor/README says:

"""
It is also possible that there are relations in the query that are not
to be locked (they are neither the UPDATE/DELETE target nor specified to
be locked in SELECT FOR UPDATE/SHARE).  When re-running the test query
we want to use the same rows from these relations that were joined to
the locked rows.  For ordinary relations this can be implemented relatively
cheaply by including the row TID in the join outputs and re-fetching that
TID.  (The re-fetch is expensive, but we're trying to optimize the normal
case where no re-test is needed.)  We have also to consider non-table
relations, such as a ValuesScan or FunctionScan.  For these, since there
is no equivalent of TID, the only practical solution seems to be to include
the entire row value in the join output row.
"""

and this is not happening for MERGE. For example, given this setup:

drop table if exists t1, t2;
create table t1 (id int primary key, val int);
create table t2 (id int primary key, val int);

an UPDATE, joining t2 to t1 does the following:

explain (verbose, costs off)
update t1 set val = t2.val from t2 where t2.id = t1.id;

                     QUERY PLAN
----------------------------------------------------
 Update on public.t1
   ->  Hash Join
         Output: t2.val, t1.ctid, t2.ctid
         Inner Unique: true
         Hash Cond: (t1.id = t2.id)
         ->  Seq Scan on public.t1
               Output: t1.ctid, t1.id
         ->  Hash
               Output: t2.val, t2.ctid, t2.id
               ->  Seq Scan on public.t2
                     Output: t2.val, t2.ctid, t2.id
(11 rows)

but the equivalent MERGE does not include CTIDs from t2 in the join output:

explain (verbose, costs off)
merge into t1 using t2 on t2.id = t1.id
  when matched then update set val = t2.val;

                QUERY PLAN
-------------------------------------------
 Merge on public.t1
   ->  Hash Join
         Output: t1.ctid, t2.val
         Inner Unique: true
         Hash Cond: (t1.id = t2.id)
         ->  Seq Scan on public.t1
               Output: t1.ctid, t1.id
         ->  Hash
               Output: t2.val, t2.id
               ->  Seq Scan on public.t2
                     Output: t2.val, t2.id
(11 rows)

On the face of it, that looks like a simple oversight in
preprocess_rowmarks(), and changing it, as in the attached, fixes the
reported issue.

Having said that, I'm not sure what guarantees we can really give
about the concurrent behaviour of MERGE. For example, if the source
table contained rows not present in the target, a MERGE with an INSERT
action would be basically the same as an INSERT ... WHERE NOT EXISTS
(...) with nothing to prevent concurrent sessions from inserting
duplicate rows. For concurrent updates, INSERT ... ON CONFLICT DO
UPDATE is probably a better choice.

Regards,
Dean

Attachment

fix-concurrent-merge.patch

Re: BUG #18103: bugs of concurrent merge into when use different join plan

From

Richard Guo

Date:

26 September 2023, 12:46:11

On Thu, Sep 21, 2023 at 7:49 PM Dean Rasheed <dean.a.rasheed@gmail.com> wrote:

I don't feel that I have a good understanding of the EPQ mechanism,
but src/backend/executor/README says:

"""
It is also possible that there are relations in the query that are not
to be locked (they are neither the UPDATE/DELETE target nor specified to
be locked in SELECT FOR UPDATE/SHARE). When re-running the test query
we want to use the same rows from these relations that were joined to
the locked rows. For ordinary relations this can be implemented relatively
cheaply by including the row TID in the join outputs and re-fetching that
TID. (The re-fetch is expensive, but we're trying to optimize the normal
case where no re-test is needed.) We have also to consider non-table
relations, such as a ValuesScan or FunctionScan. For these, since there
is no equivalent of TID, the only practical solution seems to be to include
the entire row value in the join output row.
"""

and this is not happening for MERGE.

On the face of it, that looks like a simple oversight in
preprocess_rowmarks(), and changing it, as in the attached, fixes the
reported issue.

Agreed. We need rowmarks for MERGE too. I found that the following
comment from plannodes.h is very useful in understanding this.

* When doing UPDATE, DELETE, or SELECT FOR UPDATE/SHARE, we have to uniquely
* identify all the source rows, not only those from the target relations, so
* that we can perform EvalPlanQual rechecking at need.

To nitpick, there are some comments that may need to be updated to
include MERGE, such as the comments for ExecRowMark, RowMarkType and
PlanRowMark.

Thanks
Richard

Re: BUG #18103: bugs of concurrent merge into when use different join plan

From

Dean Rasheed

Date:

27 September 2023, 10:29:27

On Tue, 26 Sept 2023 at 10:46, Richard Guo <guofenglinux@gmail.com> wrote:
>
> Agreed.  We need rowmarks for MERGE too.  I found that the following
> comment from plannodes.h is very useful in understanding this.
>
> * When doing UPDATE, DELETE, or SELECT FOR UPDATE/SHARE, we have to uniquely
> * identify all the source rows, not only those from the target relations, so
> * that we can perform EvalPlanQual rechecking at need.
>

Ah, good. Thanks. That confirms my understanding.

> To nitpick, there are some comments that may need to be updated to
> include MERGE, such as the comments for ExecRowMark, RowMarkType and
> PlanRowMark.
>

Agreed. I'll see about doing that.

Regards,
Dean