Re: [HACKERS] MERGE SQL Statement for PG11 - Mailing list pgsql-hackers
From | Pavan Deolasee |
---|---|
Subject | Re: [HACKERS] MERGE SQL Statement for PG11 |
Date | |
Msg-id | CABOikdOSsdiJLnrU6UyU-G1hergOhd74t1jsKUM4f9Fq2DZQ0w@mail.gmail.com Whole thread Raw |
In response to | Re: [HACKERS] MERGE SQL Statement for PG11 (Peter Geoghegan <pg@bowt.ie>) |
Responses |
Re: [HACKERS] MERGE SQL Statement for PG11
|
List | pgsql-hackers |
On Fri, Mar 23, 2018 at 4:43 AM, Peter Geoghegan <pg@bowt.ie> wrote:
How's it different than running a INSERT query with the target table again specified in a subquery producing the rows to be inserted? For example,
On Thu, Mar 22, 2018 at 11:42 AM, Pavan Deolasee
<pavan.deolasee@gmail.com> wrote:
> A slightly improved version attached.
You still need to remove this change:
> diff --git a/src/include/miscadmin.h b/src/include/miscadmin.h
> index a4574cd533..dbfb5d2a1a 100644
> --- a/src/include/miscadmin.h
> +++ b/src/include/miscadmin.h
> @@ -444,5 +444,6 @@ extern bool has_rolreplication(Oid roleid);
> /* in access/transam/xlog.c */
> extern bool BackupInProgress(void);
> extern void CancelBackup(void);
> +extern int64 GetXactWALBytes(void);
Sigh. Fixed in the recent version.
I see that we're still using two target RTEs in this latest revision,
v23e -- the new approach to partitioning, which I haven't had time to
study in more detail, has not produced a change there.
Yes, we continue to use two RTEs because I don't have any brighter idea than that to handle the case of partitioned table and right outer join. As I explained sometime back, this is necessary to ensure that we don't produce duplicate rows when a partition is joined with the source and then a second partition is joined again with the source.
Now I don't know if we can run a join query and still have a single RTE, but that looks improbable and wrong.
This causes
weird effects, such as the following:
"""
pg@~[20658]=# create table foo(bar int4);
CREATE TABLE
pg@~[20658]=# merge into foo f using (select 1 col) dd on f.bar=dd.col
when matched then update set bar = f.barr + 1;
ERROR: column f.barr does not exist
LINE 1: ...n f.bar=dd.col when matched then update set bar = f.barr + 1...
^
HINT: Perhaps you meant to reference the column "f.bar" or the column "f.bar".
"""
While I think this this particular HINT buglet is pretty harmless, I
continue to be concerned about the unintended consequences of having
multiple RTEs for MERGE's target table. Each RTE comes from a
different lookup path -- the first one goes through setTargetTable()'s
parserOpenTable() + addRangeTableEntryForRelation() calls. The second
one goes through transformFromClauseItem(), for the join, which
ultimately ends up calling transformTableEntry()/addRangeTableEntry().
How's it different than running a INSERT query with the target table again specified in a subquery producing the rows to be inserted? For example,
postgres=# insert into target as t select sid from source s join target t on t.ttid = s.sid;
ERROR: column t.ttid does not exist
LINE 1: ...rget as t select sid from source join target t on t.ttid = s...
^
HINT: Perhaps you meant to reference the column "t.tid" or the column "t.tid".
postgres=#
This produces a very similar looking HINT as your test above. I am certain that "target" table gets two RTEs, exactly via the same code paths as you discussed above. So if this is not a problem for INSERT, why it would be a problem for MERGE? May be I am missing a point here.
Consider commit 5f173040, which fixed a privilege escalation security
bug around multiple name lookup. Could the approach taken by MERGE
here introduce a similar security issue?
Using GDB, I see two calls to RangeVarGetRelidExtended() when a simple
MERGE is executed. They both have identical relation arguments, that
look like this:
(gdb) p *relation
$4 = {
type = T_RangeVar,
catalogname = 0x0,
schemaname = 0x0,
relname = 0x5600ebdcafb0 "foo",
inh = 1 '\001',
relpersistence = 112 'p',
alias = 0x5600ebdcb048,
location = 11
}
This seems like something that needs to be explained, at a minimum.
Even if I'm completely wrong about there being a security hazard,
maybe the suggestion that there might be still gives you some idea of
what I mean about unintended consequences.
Ok. I will try to explain it better and also think about the security hazards.
Thanks,
Pavan
Pavan Deolasee http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
PostgreSQL Development, 24x7 Support, Training & Services
pgsql-hackers by date: