Re: [HACKERS] MERGE SQL Statement for PG11 - Mailing list pgsql-hackers

From Pavan Deolasee
Subject Re: [HACKERS] MERGE SQL Statement for PG11
Date
Msg-id CABOikdOTh6GqZ6QFpXwSqaJoCTtJY5iVbJuDunTN4ZvJg1mY2A@mail.gmail.com
Whole thread Raw
In response to Re: [HACKERS] MERGE SQL Statement for PG11  (Peter Geoghegan <pg@bowt.ie>)
Responses Re: [HACKERS] MERGE SQL Statement for PG11  (Pavan Deolasee <pavan.deolasee@gmail.com>)
List pgsql-hackers


On Sun, Mar 11, 2018 at 11:18 AM, Peter Geoghegan <pg@bowt.ie> wrote:

It sounds like we should try to thoroughly understand why these
duplicates arose. Did you actually call EvalPlanQualSetPlan() for all
subplans at the time?


The reason for duplicates or even wrong answers is quite simple. The way UPDATE/DELETE currently works for partition table is that we expand the inheritance tree for the parent result relation and then create a subplan for each partition separately. This works fine, even when there exists a FROM/USING clause in the UPDATE/DELETE statement because the final result does not change irrespective of whether you first do a UNION ALL between all partitions and then find the candidate rows or whether you find candidate rows from individual partitions separately.

In case of MERGE though, since we are performing a RIGHT OUTER JOIN between the result relation and the source relation, we may conclude that a matching target  row does not exist for a source row, whereas it actually exists but in some other partition. For example, 

CREATE TABLE target (key int, val text) PARTITION BY LIST ( key);
CREATE TABLE part1 PARTITION OF target FOR VALUES IN (1, 2, 3);
CREATE TABLE part2 PARTITION OF target FOR VALUES IN (4, 5, 6);
CREATE TABLE source (skey integer);
INSERT INTO source VALUES (1), (4), (7);
INSERT INTO part1 VALUES (1, 'v1'), (2, 'v2'), (3, 'v3');
INSERT INTO part2 VALUES (4, 'v4'), (5, 'v5'), (6, 'v6');

postgres=# SELECT * FROM target RIGHT OUTER JOIN source ON key = skey;
 key | val | skey 
-----+-----+------
   1 | v1  |    1
   4 | v4  |    4
     |     |    7
(3 rows)

This gives the right answer. But if we join individual partitions and then do a UNION ALL,

postgres=# SELECT * FROM part1 RIGHT OUTER JOIN source ON key = skey UNION ALL SELECT * FROM part2 RIGHT OUTER JOIN source ON key = skey;
 key | val | skey 
-----+-----+------
   1 | v1  |    1
     |     |    4
     |     |    7
     |     |    1
   4 | v4  |    4
     |     |    7
(6 rows)

This is what nodeModifyTable does and hence we end up getting duplicates or even incorrectly declared NOT MATCHED rows, where as they are matched in a different partition.

I don't think not calling EvalPlanQualSetPlan() on all subplans is a problem because we really never execute those subplans. In fact. we should fix that so that those subplans are not even initialised.


As you know, there is an ON CONFLICT DO UPDATE + partitioning patch in
the works from Alvaro. In your explanation about that approach that
you cited, you wondered what the trouble might have been with ON
CONFLICT + partitioning, and supposed that the issues were similar
there. Are they? Has that turned up much?


Well, I initially thought that ON CONFLICT DO UPDATE on partition table may have the same challenges, but that's probably not the case. For INSERT ON CONFLICT it's still just an INSERT path, with some special handling for UPDATEs. Currently, for partition or inherited table, UPDATEs/DELETEs go via inheritance_planner() thus expanding inheritance for the result relation where as INSERTs go via simple grouping_planner(). 

For MERGE, we do all three DMLs. That doesn't mean we could not re-implement MERGE on the lines of INSERTs, but that would most likely mean complete re-writing of the UPDATEs/DELETEs for partition/inheritance tables. The challenges would just be the same in both cases.

Thanks,
Pavan

--
 Pavan Deolasee                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

pgsql-hackers by date:

Previous
From: Badrul Chowdhury
Date:
Subject: RE: Intermittent pg_ctl failures on Windows
Next
From: Ashutosh Bapat
Date:
Subject: Re: [HACKERS] Partition-wise aggregation/grouping