Home > mailing lists

Re: [HACKERS] MERGE SQL Statement for PG11 - Mailing list pgsql-hackers

From	Pavan Deolasee
Subject	Re: [HACKERS] MERGE SQL Statement for PG11
Date	March 11, 2018 08:22:02
Msg-id	CABOikdPu4NCQ3rZo95bONOzZFQ5yYavwy0Ziyc9td-ngsgiwKA@mail.gmail.com Whole thread
In response to	Re: [HACKERS] MERGE SQL Statement for PG11 (Peter Geoghegan <pg@bowt.ie>)
Responses	Re: [HACKERS] MERGE SQL Statement for PG11
List	pgsql-hackers

Tree view

On Sun, Mar 11, 2018 at 9:27 AM, Peter Geoghegan <pg@bowt.ie> wrote:

>>
>> We're talking about the scantuple here. It's not like excluded.*.

I often care about things like system columns not because of the
user-visible functionality, but because it reassures me that the
design is robust.

Ok. I will look at it. I think it shouldn't be too difficult and the original restriction was mostly a fallout of expecting CHECK constraint style expressions there.

>> I think that this is a blocker, unfortunately.
>
> You mean OVERRIDING or ruleutils?

I meant OVERRIDING, but ruleutils seems like something we need, too.

Ok. OVERRIDING is done. I think we can support ruleutils easily too. I don't know how to test that though.

>> >> * I still think we probably need to add something to ruleutils.c, so
>> >> that MERGE Query structs can be deparsed -- see get_query_def().
>> >
>> > Ok. I will look at it. Not done in this version though.
>>
>> I also wonder what it would take to support referencing CTEs. Other
>> implementations do have this. Why can't we?
>
>
> Hmm. I will look at them. But TBH I would like to postpone them to v12 if
> they turn out to be tricky. We have already covered a very large ground in
> the last few weeks, but we're approaching the feature freeze date.

I quickly implemented CTE support myself (not wCTE support, since
MERGE doesn't use RETURNING), and it wasn't tricky. It seems to work
when I mechanically duplicate the approach taken with other types of
DML statement in the parser. I have written a few tests, and so far it
holds up.

Ok, thanks. I started doing something similar, but great if you have already implemented. I will focus on other things for now.

I also undertook something quite a bit harder: Changing the
representation of the range table from parse analysis on. As I
mentioned in passing at one point, I'm concerned about the fact that
there are two RTEs for the target relation in all cases. You
introduced a separate Query.resultRelation-style RTI index,
Query.mergeTarget_relation, and we see stuff like this through every
stage of query processing, from parse analysis right through to
execution. One problem with the existing approach is that it leaves
many cases where EXPLAIN MERGE shows the target relation alias "t" as
"t_1" for some planner nodes, though not for others. More importantly,
I suspect that having two distinct RTEs has introduced special cases
that are not really needed, and will be more bug-prone (in fact, there
are bugs already, which I'll get to at the end). I think that it's
fair to say that what happens in the patch with EvalPlanQual()'s RTI
argument is ugly, especially because we also have a separate
resultRelInfo that we *don't* use. We should aspire to have a MERGE
implementation that isn't terribly different to UPDATE or DELETE,
especially within the executor.

I thought for a while about this and even tried multiple approaches before settling for what we have today. The biggest challenge is that inheritance/partition tables take completely different paths for INSERTs and UPDATE/DELETE. The RIGHT OUTER JOIN makes it kinda difficult because the regular UPDATE/DELETE code path ends up throwing duplicates when the source table is joined with individual partitions. IIRC that's the sole reason why I'd to settle on pushing the JOIN underneath, give it SELECT like treatment and then handle UPDATE/DELETE in the executor.

I wrote a very rough patch that arranged for the MERGE rtable to have
only a single relation RTE. My approach was to teach
transformFromClauseItem() and friends to recognize that they shouldn't
create a new RTE for the MERGE target RangeVar. I actually added some
hard coding to getRTEForSpecialRelationTypes() for this, which is ugly
as sin, but the general approach is likely salvageable (we could
invent a special type of RangeVar to do this, perhaps). The point here
is that everything just works if there isn't a separate RTE for the
join's leftarg and the setTargetTable() target, with exactly one
exception: partitioning becomes *thoroughly* broken. Every single
partitioning test fails with "ERROR: no relation entry for relid 1",
and occasionally some other "can't happen" error. This looks like it
would be hard to fix -- at least, I'd find it hard to fix.

Ok. If you've something which is workable, then great. But AFAICS this is what the original patch was doing until we came to support partitioning. Even with partitioning I could get everything to work, without duplicating the RTE, except the duplicate rows issue. I don't know how to solve that without doing what I've done or completely rewriting UPDATE/DELETE handling for inheritance/partition table. If you or others have better ideas, they are most welcome.

This is an extract from header comments for inheritance_planner()
(master branch):

* We have to handle this case differently from cases where a source relation
* is an inheritance set. Source inheritance is expanded at the bottom of the
* plan tree (see allpaths.c), but target inheritance has to be expanded at
* the top. The reason is that for UPDATE, each target relation needs a
* different targetlist matching its own column set. Fortunately,
* the UPDATE/DELETE target can never be the nullable side of an outer join,
* so it's OK to generate the plan this way.

Of course, that isn't true of MERGE: The MERGE target *can* be the
nullable side of an outer join. That's probably a big complication for
using one target RTE. Your approach to implementing partitioning [1]
seems to benefit from having two different RTEs, in a way -- you
sidestep the restriction.

Right. The entire purpose of having two different RTEs is to work around this problem. I explained this approach here [1]. I didn't receive any objections then, but that's mostly because nobody read it carefully. As I said, if we have an alternate feasible and better mechanism, let's go for it as long as efforts are justifiable.

Thanks,

Pavan

[1] https://www.postgresql.org/message-id/CABOikdM%2Bc1vB_%2B3tYEjO%3DJ6U2uNHzKU_b%3DU72tadD5-9xQcbHA%40mail.gmail.com

Pavan Deolasee http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

pgsql-hackers by date:

From: Peter Geoghegan
Date: 11 March 2018, 06:57:08
Subject: Re: [HACKERS] MERGE SQL Statement for PG11

From: Pavan Deolasee
Date: 11 March 2018, 08:27:52
Subject: Re: Faster inserts with mostly-monotonically increasing values

Re: [HACKERS] MERGE SQL Statement for PG11 - Mailing list pgsql-hackers

Previous

Next