Re: MERGE ... WHEN NOT MATCHED BY SOURCE - Mailing list pgsql-hackers

From Dean Rasheed
Subject Re: MERGE ... WHEN NOT MATCHED BY SOURCE
Date
Msg-id CAEZATCVR6JFCb9BgExWBpugB9Cs1h02LKD-7XZh_UD4OoTORwA@mail.gmail.com
Whole thread Raw
In response to Re: MERGE ... WHEN NOT MATCHED BY SOURCE  (Dean Rasheed <dean.a.rasheed@gmail.com>)
List pgsql-hackers
On Thu, 21 Mar 2024 at 09:35, Dean Rasheed <dean.a.rasheed@gmail.com> wrote:
>
> Trivial rebase forced by 6185c9737c.
>

I think it would be good to get this committed.

It has had a decent amount of review, at least up to v9, but a number
of things have changed since then:

1). Concurrent update behaviour -- now if a concurrent update causes a
matched candidate row to no longer match the join condition, it will
execute the first qualifying NOT MATCHED BY SOURCE action, and then
the first qualifying NOT MATCHED [BY TARGET] action. I.e., it may
execute 2 actions, which makes sense because if the rows had started
out not matching, the full join would have output 2 rows.

2). ResultRelInfo now has 3 lists of actions, one per match kind.
Previously I was putting the NOT MATCHED BY SOURCE actions in the same
list as the MATCHED actions, since they are both handled by
ExecMergeMatched(). However, to achieve (1) above, it turned out to be
easier to have 3 separate lists, and this makes some other code a
little neater.

3). I've added a new field Query.mergeJoinCondition so that
transformMergeStmt() no longer puts the join conditions in
qry->jointree->quals. That's necessary to make it work correctly on an
auto-updatable view which might have its own quals, but it also seems
neater anyway.

4). To distinguish the MATCHED case from NOT MATCHED BY SOURCE case in
the executor, it now uses the join condition (previously it added a
"source IS [NOT] NULL" clause to each merge action). This has the
advantage that it involves just one qual check per candidate row,
rather than one for each action. On the downside, it's checking the
join condition twice (since the source subplan's join node already
checked it), but I couldn't see an easy way round that. (It only does
this if there are both MATCHED and NOT MATCHED BY SOURCE actions, so
it's not making any existing queries worse.)

5). To support (4), I added new fields
ModifyTablePath.mergeJoinConditions, ModifyTable.mergeJoinConditions
and ResultRelInfo.ri_MergeJoinCondition, since the attribute numbers
in the join condition might vary by partition.

6). I got rid of Query.mergeSourceRelation, which is no longer needed,
because of (4). (And as before, it also gets rid of
Query.mergeUseOuterJoin, since the parser is no longer making the
decision about what kind of join to build.)

7). To support (1), I added a new field
ModifyTableState.mt_merge_pending_not_matched, because if it has to
execute 2 actions following a concurrent update, and there is a
RETURNING clause, it has to defer the second action until the next
call to ExecModifyTable().

8). I've added isolation tests to test (1).

9). I've added a lot more regression tests.

10). I've made a lot of comment changes in nodeModifyTable.c,
especially relating to the discussion around concurrent updates.

Overall, I feel like this is in pretty good shape, and it manages to
make a few code simplifications that look quite nice.

Regards,
Dean



pgsql-hackers by date:

Previous
From: Amit Kapila
Date:
Subject: Re: Introduce XID age and inactive timeout based replication slot invalidation
Next
From: Amit Kapila
Date:
Subject: Re: pgsql: Track last_inactive_time in pg_replication_slots.