Re: [HACKERS] MERGE SQL Statement for PG11 - Mailing list pgsql-hackers

From Pavan Deolasee
Subject Re: [HACKERS] MERGE SQL Statement for PG11
Date
Msg-id CABOikdOUoaXt1H885TC_cA1LoErEejqdVDZqG62rQkiZPPyg0Q@mail.gmail.com
Whole thread Raw
In response to Re: [HACKERS] MERGE SQL Statement for PG11  (Peter Geoghegan <pg@bowt.ie>)
Responses Re: [HACKERS] MERGE SQL Statement for PG11  (Robert Haas <robertmhaas@gmail.com>)
Re: [HACKERS] MERGE SQL Statement for PG11  (Peter Geoghegan <pg@bowt.ie>)
List pgsql-hackers


On Tue, Mar 6, 2018 at 9:55 AM, Peter Geoghegan <pg@bowt.ie> wrote:


Sure, ExecUpdate() and ExecDelete() *do* require a little special
handling for MERGE, but ExecMerge() should itself call EvalPlanQual()
for the join quals, rather than getting ExecUpdate()/ExecDelete() to
do it. All that ExecUpdate()/ExecDelete() need to tell ExecMerge() is
"you need to do your HeapTupleUpdated stuff". This not only clarifies
the new EPQ design -- it also lets you do that special case rti EPQ
stuff in the right place (BTW, I'm not a huge fan of that detail in
general, but haven't thought about it enough to say more). The new EPQ
code within ExecUpdate() and ExecDelete() is already identical, so why
not do it that way?

I especially like this organization because ExecOnConflictUpdate()
already calls ExecUpdate() without expecting it to do EPQ handling at
all, which is kind of similar to what I suggest -- in both cases, the
caller "takes care of all of the details of the scan". And, by
returning a HTSU_Result value to ExecMerge() from
ExecUpdate()/ExecDelete(), you could also do the new cardinality
violation stuff from within ExecMerge() in exactly one place -- no
need to invent new error_on_SelfUpdate arguments.

I would also think about organizing ExecMerge() handling so that
CMD_INSERT handling became WHEN NOT MATCHED handling. It's
unnecessarily confusing to have that included in the same switch
statement as CMD_UPDATE and CMD_DELETE, since that organization fails
to convey that once we reach WHEN NOT MATCHED, there is no going back.
Actually, I suppose the new EPQ organization described in the last few
paragraphs already implies that you'd do this CMD_INSERT stuff, since
the high level goal is breaking ExecMerge() into WHEN MATCHED and WHEN
NOT MATCHED sections (doing "commonality and variability analysis" for
CMD_UPDATE and CMD_DELETE only serves that high level goal).


Thanks for the feedback. I've greatly refactored the code based on your comments and I too like the resultant code. What we have now have essentially is: two functions:
 
ExecMergeMatched() - deals with matched rows. In case of concurrent update/delete, it also runs EvalPlanQual and checks if the updated row still meets the join quals. If so, it will restart and recheck if some other WHEN MATCHED action should be executed. If the target row is deleted or if the join quals fail, it simply returns and let the next function handle it

ExecMargeNotMatched() -  deals with not-matched rows. Essentially it only needs to execute INSERT if a WHEN NOT MATCHED action exists and the additional quals pass.

Per your suggestion, ExecUpdate() and ExecDelete() only returns enough information for ExecMergeMatched() to run EvalPlanQual or take any other action, as needed. So changes to these functions are now minimal. Also, the EPQ handling for UPDATE/DELETE is now common, which is nice.

I've also added a bunch of comments, but it might still not be enough. Feel free to suggest improvements there.

 
Other feedback:

* Is a new ExecMaterializeSlot() call needed for the EPQ slot? Again,
doing your own EvalPlanQual() within ExecMerge() would perhaps have
avoided this.

Fixed.
 

* Do we really need UpdateStmt raw parser node pointers in structs
that appear in execnodes.h? What's that all about?

No, it was left-over from the earlier code. Removed.
 

* Tests for transition table behavior (mixed INSERTs, UPDATEs, and
DELETEs) within triggers.sql seems like a good idea.

Ok, I will add. But not done in this version.
 

* Is this comment really accurate?:

> +               /*
> +                * If EvalPlanQual did not return a tuple, it means we
> +                * have seen a concurrent delete, or a concurrent update
> +                * where the row has moved to another partition.
> +                *
> +                * Set matched = false and loop back if there exists a NOT
> +                * MATCHED action. Otherwise, we have nothing to do for this
> +                * tuple and we can continue to the next tuple.
> +                */

Won't we loop back when a concurrent update occurs that makes the new
candidate tuple not satisfy the join qual anymore? What does this have
to do with partitioning?

Yeah, it was wrong. Fixed as part of larger reorganisation anyways.
 

* Why does MergeJoin get referenced here?:

> + * Also, since the internal MergeJoin node can hide the source and target
> + * relations, we must explicitly make the respective relation as visible so
> + * that columns can be referenced unqualified from these relations.

I meant to say underlying JOIN for MERGE. Fixed by simply saying Join.
 

* This patch could use a pg_indent.


Done.
 
* We already heard about this code from Tomas, who found it unnecessary:

> +           /*
> +            * SQL Standard says that WHEN AND conditions must not
> +            * write to the database, so check we haven't written
> +            * any WAL during the test. Very sensible that is, since
> +            * we can end up evaluating some tests multiple times if
> +            * we have concurrent activity and complex WHEN clauses.
> +            *
> +            * XXX If we had some clear form of functional labelling
> +            * we could use that, if we trusted it.
> +            */
> +           if (startWAL < GetXactWALBytes())
> +               ereport(ERROR,
> +                       (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
> +                        errmsg("cannot write to database within WHEN AND condition")));

This needs to go. Apart from the fact that GetXactWALBytes() is buggy
(it returns int64 for the unsigned type XLogRecPtr), the whole idea
just seems unnecessary. I don't see why this is any different to using
a volatile function in a regular UPDATE.

I removed this code since it was wrong. We might want to add some basic checks for existence of volatile functions in the WHEN or SET clauses. But I agree, it's no different than regular UPDATEs. So may be not a big deal. 
 

* I still think we probably need to add something to ruleutils.c, so
that MERGE Query structs can be deparsed -- see get_query_def().

Ok. I will look at it. Not done in this version though.

Rebased on the current master too.

Thanks,
Pavan

--
 Pavan Deolasee                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services
Attachment

pgsql-hackers by date:

Previous
From: Shubham Barai
Date:
Subject: Re: [HACKERS] GSoC 2017: weekly progress reports (week 6)
Next
From: David Rowley
Date:
Subject: Re: ALTER TABLE ADD COLUMN fast default