Thread: Support UPDATE table SET(*)=...
Hi All,
Please find attached a patch which implements support for UPDATE table1 SET(*)=...
The patch supports both UPDATE table SET(*)=(a,b,c) and UPDATE table1 SET(*)=(SELECT a,b,c FROM...).
It solves the problem of doing UPDATE from a record variable of the same type as the table e.g. update foo set (*) = (select foorec.*) where ...;
The design is simple. It basically expands the * in transformation stage, does the necessary type checking and adds it to the parse tree. This allows for normal execution for the rest of the stages.
Please find attached a patch which implements support for UPDATE table1 SET(*)=...
The patch supports both UPDATE table SET(*)=(a,b,c) and UPDATE table1 SET(*)=(SELECT a,b,c FROM...).
It solves the problem of doing UPDATE from a record variable of the same type as the table e.g. update foo set (*) = (select foorec.*) where ...;
The design is simple. It basically expands the * in transformation stage, does the necessary type checking and adds it to the parse tree. This allows for normal execution for the rest of the stages.
Thoughts/Comments?
Regards,
Atri
Atri
Attachment
Hi On Wed, Oct 15, 2014 at 11:02 AM, Atri Sharma <atri.jiit@gmail.com> wrote: > Please find attached a patch which implements support for UPDATE table1 > SET(*)=... I presume you haven't read Tom Lane's proposal and discussion about multiple column assignment in UPDATE: http://www.postgresql.org/message-id/1783.1399054541@sss.pgh.pa.us (Assigning all columns was also discussed there) And there's a WIP patch: http://www.postgresql.org/message-id/20930.1402931841@sss.pgh.pa.us Regards, Marti
On Wednesday, October 15, 2014, Marti Raudsepp <marti@juffo.org> wrote:
Hi
On Wed, Oct 15, 2014 at 11:02 AM, Atri Sharma <atri.jiit@gmail.com> wrote:
> Please find attached a patch which implements support for UPDATE table1
> SET(*)=...
I presume you haven't read Tom Lane's proposal and discussion about
multiple column assignment in UPDATE:
http://www.postgresql.org/message-id/1783.1399054541@sss.pgh.pa.us
(Assigning all columns was also discussed there)
And there's a WIP patch:
http://www.postgresql.org/message-id/20930.1402931841@sss.pgh.pa.us
Thanks for the links, but this patch only targets SET(*) case, which, if I understand correctly, the patch you mentioned doesn't directly handle (If I understand correctly, the target of the two patches is different).
Regards,
--
Regards,
Atri
l'apprenant
On Wed, Oct 15, 2014 at 2:18 PM, Atri Sharma <atri.jiit@gmail.com> wrote:
On Wednesday, October 15, 2014, Marti Raudsepp <marti@juffo.org> wrote:Hi
On Wed, Oct 15, 2014 at 11:02 AM, Atri Sharma <atri.jiit@gmail.com> wrote:
> Please find attached a patch which implements support for UPDATE table1
> SET(*)=...
I presume you haven't read Tom Lane's proposal and discussion about
multiple column assignment in UPDATE:
http://www.postgresql.org/message-id/1783.1399054541@sss.pgh.pa.us
(Assigning all columns was also discussed there)
And there's a WIP patch:
http://www.postgresql.org/message-id/20930.1402931841@sss.pgh.pa.usThanks for the links, but this patch only targets SET(*) case, which, if I understand correctly, the patch you mentioned doesn't directly handle (If I understand correctly, the target of the two patches is different).
Digging more, I figured that the patch I posted builds on top of Tom's patch, since it did not add whole row cases.
Regards,
Atri
Atri
On Wed, Oct 15, 2014 at 3:48 AM, Atri Sharma <atri.jiit@gmail.com> wrote: > > > On Wednesday, October 15, 2014, Marti Raudsepp <marti@juffo.org> wrote: >> >> Hi >> >> On Wed, Oct 15, 2014 at 11:02 AM, Atri Sharma <atri.jiit@gmail.com> wrote: >> > Please find attached a patch which implements support for UPDATE table1 >> > SET(*)=... >> >> I presume you haven't read Tom Lane's proposal and discussion about >> multiple column assignment in UPDATE: >> http://www.postgresql.org/message-id/1783.1399054541@sss.pgh.pa.us >> (Assigning all columns was also discussed there) >> >> And there's a WIP patch: >> http://www.postgresql.org/message-id/20930.1402931841@sss.pgh.pa.us > > Thanks for the links, but this patch only targets SET(*) case, which, if I > understand correctly, the patch you mentioned doesn't directly handle (If I > understand correctly, the target of the two patches is different). Yeah -- in fact, there was some discussion about this exact case. This patch solves a very important problem: when doing record operations to move data between databases with identical schema there's currently no way to 'update' in a generic way without building out the entire field list via complicated and nasty dynamic SQL. I'm not sure about the proposed syntax though; it seems a little weird to me. Any particular reason why you couldn't have just done: UPDATE table1 SET * = a,b,c, ... also, UPDATE table1 t SET t = (SELECT (a,b,c)::t FROM...); seems cleaner than the proposed syntax for row assignment. Tom objected though IIRC. merlin
On Fri, Oct 17, 2014 at 7:45 PM, Merlin Moncure <mmoncure@gmail.com> wrote:
Yeah -- in fact, there was some discussion about this exact case.On Wed, Oct 15, 2014 at 3:48 AM, Atri Sharma <atri.jiit@gmail.com> wrote:
>
>
> On Wednesday, October 15, 2014, Marti Raudsepp <marti@juffo.org> wrote:
>>
>> Hi
>>
>> On Wed, Oct 15, 2014 at 11:02 AM, Atri Sharma <atri.jiit@gmail.com> wrote:
>> > Please find attached a patch which implements support for UPDATE table1
>> > SET(*)=...
>>
>> I presume you haven't read Tom Lane's proposal and discussion about
>> multiple column assignment in UPDATE:
>> http://www.postgresql.org/message-id/1783.1399054541@sss.pgh.pa.us
>> (Assigning all columns was also discussed there)
>>
>> And there's a WIP patch:
>> http://www.postgresql.org/message-id/20930.1402931841@sss.pgh.pa.us
>
> Thanks for the links, but this patch only targets SET(*) case, which, if I
> understand correctly, the patch you mentioned doesn't directly handle (If I
> understand correctly, the target of the two patches is different).
This patch solves a very important problem: when doing record
operations to move data between databases with identical schema
there's currently no way to 'update' in a generic way without building
out the entire field list via complicated and nasty dynamic SQL.
Thanks!
I'm
not sure about the proposed syntax though; it seems a little weird to
me. Any particular reason why you couldn't have just done:
UPDATE table1 SET * = a,b,c, ...
also,
UPDATE table1 t SET t = (SELECT (a,b,c)::t FROM...);
I honestly have not spent a lot of time thinking about the exact syntax that may be acceptable. If we have arguments for or against a specific syntax, I will be glad to incorporate them.
On 10/17/14 4:15 PM, Merlin Moncure wrote: > Any particular reason why you couldn't have just done: > > UPDATE table1 SET * = a,b,c, ... That just looks wrong to me. I'd prefer (*) = .. over that any day. > UPDATE table1 t SET t = (SELECT (a,b,c)::t FROM...); > > seems cleaner than the proposed syntax for row assignment. Tom > objected though IIRC. I don't know about Tom, but I didn't like that: http://www.postgresql.org/message-id/5364C982.7060003@joh.to .marko
On Fri, Oct 17, 2014 at 9:55 AM, Marko Tiikkaja <marko@joh.to> wrote: > On 10/17/14 4:15 PM, Merlin Moncure wrote: >> >> Any particular reason why you couldn't have just done: >> >> UPDATE table1 SET * = a,b,c, ... > > > That just looks wrong to me. I'd prefer (*) = .. over that any day. > >> UPDATE table1 t SET t = (SELECT (a,b,c)::t FROM...); >> >> seems cleaner than the proposed syntax for row assignment. Tom >> objected though IIRC. > > I don't know about Tom, but I didn't like that: > http://www.postgresql.org/message-id/5364C982.7060003@joh.to Hm, I didn't understand your objection: <quoting> So e.g.: UPDATE foo f SET f = ..; would resolve to the table, despite there being a column called "f"? That would break backwards compatibility. </quoting> That's not correct: it should work exactly as 'select' does; given a conflict resolve the field name, so no backwards compatibility issue. merlin
Merlin Moncure <mmoncure@gmail.com> writes: > On Wed, Oct 15, 2014 at 3:48 AM, Atri Sharma <atri.jiit@gmail.com> wrote: >> Thanks for the links, but this patch only targets SET(*) case, which, if I >> understand correctly, the patch you mentioned doesn't directly handle (If I >> understand correctly, the target of the two patches is different). > Yeah -- in fact, there was some discussion about this exact case. > This patch solves a very important problem: when doing record > operations to move data between databases with identical schema > there's currently no way to 'update' in a generic way without building > out the entire field list via complicated and nasty dynamic SQL. I'm > not sure about the proposed syntax though; it seems a little weird to > me. Any particular reason why you couldn't have just done: > UPDATE table1 SET * = a,b,c, ... > also, > UPDATE table1 t SET t = (SELECT (a,b,c)::t FROM...); > seems cleaner than the proposed syntax for row assignment. Tom > objected though IIRC. That last proposal is no good because it would be ambiguous if the table contains a column by that name. The "(*)" idea actually seems not bad, since it's shorthand for a parenthesized column list. I'm not sure about the patch itself though --- in particular, it doesn't seem to me that it should be touching transformTargetList, since that doesn't have anything to do with expansion of multiassignments today. Probably this is a symptom of having chosen a poor representation of the construct in the raw grammar output. However, I've not exactly wrapped my head around what that representation is ... the lack of any comments explaining it doesn't help. A larger question is whether it's appropriate to do the *-expansion at parse time, or whether we'd need to postpone it to later in order to handle reasonable use-cases. Since we expand "SELECT *" at parse time (and are mandated to do so by the SQL spec, I believe), it seems consistent to do this at parse time as well; but perhaps there is a counter argument. regards, tom lane
On 10/17/14 5:03 PM, Merlin Moncure wrote: > Hm, I didn't understand your objection: > > <quoting> > So e.g.: > UPDATE foo f SET f = ..; > > would resolve to the table, despite there being a column called "f"? > That would break backwards compatibility. > </quoting> > > That's not correct: it should work exactly as 'select' does; given a > conflict resolve the field name, so no backwards compatibility issue. local:marko=# show server_version; server_version ---------------- 9.1.13 (1 row) local:marko=#* create table foo(f int); CREATE TABLE local:marko=#* update foo f set f=1; UPDATE 0 This query would change meaning with your suggestion. I'm not saying it would be a massive problem in practice, but I think we should first consider options which don't break backwards compatibility, even if some consider them "less clean". .marko
Merlin Moncure <mmoncure@gmail.com> writes: > On Fri, Oct 17, 2014 at 9:55 AM, Marko Tiikkaja <marko@joh.to> wrote: >> I don't know about Tom, but I didn't like that: >> http://www.postgresql.org/message-id/5364C982.7060003@joh.to > Hm, I didn't understand your objection: > <quoting> > So e.g.: > UPDATE foo f SET f = ..; > would resolve to the table, despite there being a column called "f"? > That would break backwards compatibility. > </quoting> > That's not correct: it should work exactly as 'select' does; given a > conflict resolve the field name, so no backwards compatibility issue. The point is that it's fairly messy (and bug-prone) to have a syntax where we have to make an arbitrary choice between two reasonable interpretations. If you look back at the whole thread Marko's above-cited message is in, we'd considered a bunch of different possible syntaxes for this, and none of them had much support. The "(*)" idea actually is starting to look pretty good to me. regards, tom lane
On Fri, Oct 17, 2014 at 10:10 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Merlin Moncure <mmoncure@gmail.com> writes: >> On Fri, Oct 17, 2014 at 9:55 AM, Marko Tiikkaja <marko@joh.to> wrote: >>> I don't know about Tom, but I didn't like that: >>> http://www.postgresql.org/message-id/5364C982.7060003@joh.to > >> Hm, I didn't understand your objection: > >> <quoting> >> So e.g.: >> UPDATE foo f SET f = ..; > >> would resolve to the table, despite there being a column called "f"? >> That would break backwards compatibility. >> </quoting> > >> That's not correct: it should work exactly as 'select' does; given a >> conflict resolve the field name, so no backwards compatibility issue. > > The point is that it's fairly messy (and bug-prone) to have a syntax > where we have to make an arbitrary choice between two reasonable > interpretations. > > If you look back at the whole thread Marko's above-cited message is in, > we'd considered a bunch of different possible syntaxes for this, and > none of them had much support. The "(*)" idea actually is starting to > look pretty good to me. Hm, I'll take it then. merlin
Marko Tiikkaja <marko@joh.to> writes: > local:marko=#* create table foo(f int); > CREATE TABLE > local:marko=#* update foo f set f=1; > UPDATE 0 > This query would change meaning with your suggestion. I think it wouldn't; Merlin is proposing that f would be taken as the field name. A more realistic objection goes like this: create table foo(f int, g int); update foo x set x = (1,2); -- works alter table foo add column x int; update foo x set x = (1,2,3); -- no longer works It's not a real good thing if a column addition or renaming can so fundamentally change the nature of a query. regards, tom lane
On Fri, Oct 17, 2014 at 10:16 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Marko Tiikkaja <marko@joh.to> writes: >> local:marko=#* create table foo(f int); >> CREATE TABLE >> local:marko=#* update foo f set f=1; >> UPDATE 0 > >> This query would change meaning with your suggestion. > > I think it wouldn't; Merlin is proposing that f would be taken as the > field name. A more realistic objection goes like this: > > create table foo(f int, g int); > update foo x set x = (1,2); -- works > alter table foo add column x int; > update foo x set x = (1,2,3); -- no longer works > > It's not a real good thing if a column addition or renaming can > so fundamentally change the nature of a query. That's exactly how SELECT works. I also dispute that the user should be surprised in such cases. merlin
Merlin Moncure <mmoncure@gmail.com> writes: > On Fri, Oct 17, 2014 at 10:16 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> I think it wouldn't; Merlin is proposing that f would be taken as the >> field name. A more realistic objection goes like this: >> >> create table foo(f int, g int); >> update foo x set x = (1,2); -- works >> alter table foo add column x int; >> update foo x set x = (1,2,3); -- no longer works >> >> It's not a real good thing if a column addition or renaming can >> so fundamentally change the nature of a query. > That's exactly how SELECT works. I also dispute that the user should > be surprised in such cases. Well, the reason we have a problem in SELECT is that we support an unholy miscegenation of SQL-spec and PostQUEL syntax: the idea that "SELECT foo FROM foo" could represent a whole-row selection is nowhere to be found in the SQL standard, for good reason IMO. But we've never had the courage to break cleanly with this Berkeley leftover and insist that people spell it SQL-style as "SELECT ROW(foo.*) FROM foo". I'd just as soon not introduce the same kind of ambiguity into UPDATE if we have a reasonable alternative. regards, tom lane
On Fri, Oct 17, 2014 at 10:30 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Merlin Moncure <mmoncure@gmail.com> writes: >> On Fri, Oct 17, 2014 at 10:16 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >>> I think it wouldn't; Merlin is proposing that f would be taken as the >>> field name. A more realistic objection goes like this: >>> >>> create table foo(f int, g int); >>> update foo x set x = (1,2); -- works >>> alter table foo add column x int; >>> update foo x set x = (1,2,3); -- no longer works >>> >>> It's not a real good thing if a column addition or renaming can >>> so fundamentally change the nature of a query. > >> That's exactly how SELECT works. I also dispute that the user should >> be surprised in such cases. > > Well, the reason we have a problem in SELECT is that we support an > unholy miscegenation of SQL-spec and PostQUEL syntax: the idea that > "SELECT foo FROM foo" could represent a whole-row selection is nowhere > to be found in the SQL standard, for good reason IMO. But we've never > had the courage to break cleanly with this Berkeley leftover and > insist that people spell it SQL-style as "SELECT ROW(foo.*) FROM foo". > I'd just as soon not introduce the same kind of ambiguity into UPDATE > if we have a reasonable alternative. Ah, interesting point (I happen to like the terse syntax and use it often). This is for posterity anyways since you guys seem to like Atri's proposal, which surprised me. However, I think you're over simplifying things here. Syntax aside: I think SELECT f FROM foo f; and a hypothetical SELECT row(f.*) FROM foo f; give different semantics. The former gives an object of type 'f' and the latter gives type 'row'. To get parity, you'd have to add an extra cast which means you'd have to play tricky games to avoid extra performance overhead besides being significantly more verbose. I'm aware some of the other QUELisms are pretty dodgy and have burned us in the past (like that whole function as record reference thing) but pulling a record as a field in select is pretty nice. It's also widely used and quite useful in json serialization. merlin
<p dir="ltr"><br /> On Oct 17, 2014 6:16 PM, "Tom Lane" <<a href="mailto:tgl@sss.pgh.pa.us">tgl@sss.pgh.pa.us</a>>wrote:<br /> > A more realistic objection goes like this:<br/> ><br /> > create table foo(f int, g int);<br /> > update foo x set x = (1,2); -- works<br /> > altertable foo add column x int;<br /> > update foo x set x = (1,2,3); -- no longer works<br /> ><br /> > It'snot a real good thing if a column addition or renaming can<br /> > so fundamentally change the nature of a query.<pdir="ltr">I think a significant use case for this feature is when you already have a row-value and want to persistit in the database, like you can do with INSERT:<br /> insert into foo select * from populate_record_json(null::foo,'{...}');<p dir="ltr">In this case the opposite is true: requiring explicit column names wouldbreak the query if you add columns to the table. The fact that you can't reasonably use populate_record/_json with UPDATEis a significant omission. IMO this really speaks for supporting shorthand whole-row assignment, whatever the syntax.<pdir="ltr">Regards,<br /> Marti
On Fri, Oct 17, 2014 at 10:47 AM, Merlin Moncure <mmoncure@gmail.com> wrote: > On Fri, Oct 17, 2014 at 10:30 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> Merlin Moncure <mmoncure@gmail.com> writes: >>> On Fri, Oct 17, 2014 at 10:16 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >>>> I think it wouldn't; Merlin is proposing that f would be taken as the >>>> field name. A more realistic objection goes like this: >>>> >>>> create table foo(f int, g int); >>>> update foo x set x = (1,2); -- works >>>> alter table foo add column x int; >>>> update foo x set x = (1,2,3); -- no longer works >>>> >>>> It's not a real good thing if a column addition or renaming can >>>> so fundamentally change the nature of a query. >> >>> That's exactly how SELECT works. I also dispute that the user should >>> be surprised in such cases. >> >> Well, the reason we have a problem in SELECT is that we support an >> unholy miscegenation of SQL-spec and PostQUEL syntax: the idea that >> "SELECT foo FROM foo" could represent a whole-row selection is nowhere >> to be found in the SQL standard, for good reason IMO. But we've never >> had the courage to break cleanly with this Berkeley leftover and >> insist that people spell it SQL-style as "SELECT ROW(foo.*) FROM foo". >> I'd just as soon not introduce the same kind of ambiguity into UPDATE >> if we have a reasonable alternative. > > Ah, interesting point (I happen to like the terse syntax and use it > often). This is for posterity anyways since you guys seem to like > Atri's proposal, which surprised me. However, I think you're over > simplifying things here. Syntax aside: I think > SELECT f FROM foo f; > and a hypothetical > SELECT row(f.*) FROM foo f; > > give different semantics. The former gives an object of type 'f' and > the latter gives type 'row'. To get parity, you'd have to add an > extra cast which means you'd have to play tricky games to avoid extra > performance overhead besides being significantly more verbose. I'm > aware some of the other QUELisms are pretty dodgy and have burned us > in the past (like that whole function as record reference thing) but > pulling a record as a field in select is pretty nice. It's also > widely used and quite useful in json serialization. Been thinking about this in the back of my mind the last couple of days. There are some things you can do with the QUEL 'table as column' in select syntax that are impossible otherwise, at least today, and its usage is going to proliferate because of that. Row construction via () or row() needs to be avoided whenever the column names are important and there is no handy type to cast to. For example, especially during json serialization it's convenient to do things like: select array_agg((select q from (select a, b) q) order by ...) from foo; ...where a,b are fields of foo. FWICT, this is the only way to do json serialization of arbitrary runtime row constructions in a way that does not anonymize the type. Until I figured out this trick I used to create lots of composite types that served no purpose other than to give me a type to cast to which is understandably annoying. if: select (x).* from (select (1, 2) as x) q; worked and properly expanded x to given names should they exist and: SELECT row(f.*) FROM foo f; worked and did same, and: SELECT (row(f.*)).* FROM foo f; was as reasonably performant and gave the same results as: SELECT (f).* FROM foo f; ...then IMSNHO you'd have a reasonable path to deprecating the QUEL inspired syntax. Food for thought. merlin
Hi Atri, Sorry for the delay. With pgconf.eu and all, it's been very hard to find the time to look at this. On 10/15/14, 10:02 AM, Atri Sharma wrote: > Please find attached a patch which implements support for UPDATE table1 > SET(*)=... > The patch supports both UPDATE table SET(*)=(a,b,c) and UPDATE table1 > SET(*)=(SELECT a,b,c FROM...). > It solves the problem of doing UPDATE from a record variable of the same > type as the table e.g. update foo set (*) = (select foorec.*) where ...; Excellent! This is a very welcome change. I've had a few looks at this patch and I have a few comments: 1) This doesn't work for the zero-column table case at all: CREATE TABLE foo(); UPDATE foo SET (*) = (SELECT); ERROR: number of columns does not match number of values 2) What's the purpose of the second condition here? if (!(origTarget) || !(origTarget->name)) 3) The extra parentheses around everything make this code for some reason very hard to read. 4) transformTargetList() is a mess right now. If this is the approach we want to take, the common code should probably be refactored into a function. But the usage of List as a somehow magical way to represent the SET (*) case makes me feel weird inside. 5) The complete lack of regression tests make it hard to poke around the code to try and figure out what each line/condition is trying to do. I feel like I understand what this code is doing and some details feel a bit icky, but I'm not the right person to comment on whether the broad strokes are on the right canvas or not. Maybe someone else wants to take a closer look before Atri spends too much time on this approach? After all, this patch has a very distinctive WIP feel to it, so I guess feedback on the general approach is what's being sought after here, and in that area I consider my skills and knowledge lacking. > The design is simple. It basically expands the * in transformation stage, > does the necessary type checking and adds it to the parse tree. This allows > for normal execution for the rest of the stages. I can't poke any big holes into this approach (disregarding the details of this implementation), but perhaps someone else can? .marko
Marko Tiikkaja <marko@joh.to> writes: > On 10/15/14, 10:02 AM, Atri Sharma wrote: >> Please find attached a patch which implements support for UPDATE table1 >> SET(*)=... > I've had a few looks at this patch and I have a few comments: > 1) This doesn't work for the zero-column table case at all: > CREATE TABLE foo(); > UPDATE foo SET (*) = (SELECT); > ERROR: number of columns does not match number of values > 2) What's the purpose of the second condition here? > if (!(origTarget) || !(origTarget->name)) > 3) The extra parentheses around everything make this code for some > reason very hard to read. > 4) transformTargetList() is a mess right now. If this is the > approach we want to take, the common code should probably be refactored > into a function. But the usage of List as a somehow magical way to > represent the SET (*) case makes me feel weird inside. > 5) The complete lack of regression tests make it hard to poke around > the code to try and figure out what each line/condition is trying to do. > I feel like I understand what this code is doing and some details feel a > bit icky, but I'm not the right person to comment on whether the broad > strokes are on the right canvas or not. Maybe someone else wants to > take a closer look before Atri spends too much time on this approach? FWIW, I opined upthread that transformTargetList was not the place to be handling this; it should be done in the same place(s) that support the existing MultiAssignRef feature, and transformTargetList is not that. I think what's likely missing here is a clear design for the raw parse tree representation (what's returned by the bison grammar). The patch seems to be trying to skate by without creating any new parse node types or fields, but that may well be a bad idea. At the very least there needs to be some commentary added to parsenodes.h explaining what the representation actually is (cf commentary there for MultiAssignRef). Also, I think it's a mistake not to be following the MultiAssignRef model for the case of "(*) = ctext_row". We optimize the ROW-source case at the grammar stage when there's a fixed number of target columns, because that's a very easy transformation --- but you can't do it like that when there's not. It's possible that that optimization should be delayed till later even in the existing case; in general, optimizing in gram.y is a bad habit that's better avoided ... regards, tom lane
On Wed, Nov 26, 2014 at 4:26 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > I think what's likely missing here is a clear design for the raw parse > tree representation (what's returned by the bison grammar). The patch > seems to be trying to skate by without creating any new parse node types > or fields, but that may well be a bad idea. At the very least there > needs to be some commentary added to parsenodes.h explaining what the > representation actually is (cf commentary there for MultiAssignRef). > > Also, I think it's a mistake not to be following the MultiAssignRef > model for the case of "(*) = ctext_row". We optimize the ROW-source > case at the grammar stage when there's a fixed number of target columns, > because that's a very easy transformation --- but you can't do it like > that when there's not. It's possible that that optimization should be > delayed till later even in the existing case; in general, optimizing > in gram.y is a bad habit that's better avoided ... Marking as "returned with feedback" based on those comments. -- Michael
On Mon, Dec 8, 2014 at 6:06 AM, Michael Paquier <michael.paquier@gmail.com> wrote:
On Wed, Nov 26, 2014 at 4:26 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> I think what's likely missing here is a clear design for the raw parse
> tree representation (what's returned by the bison grammar). The patch
> seems to be trying to skate by without creating any new parse node types
> or fields, but that may well be a bad idea. At the very least there
> needs to be some commentary added to parsenodes.h explaining what the
> representation actually is (cf commentary there for MultiAssignRef).
>
> Also, I think it's a mistake not to be following the MultiAssignRef
> model for the case of "(*) = ctext_row". We optimize the ROW-source
> case at the grammar stage when there's a fixed number of target columns,
> because that's a very easy transformation --- but you can't do it like
> that when there's not. It's possible that that optimization should be
> delayed till later even in the existing case; in general, optimizing
> in gram.y is a bad habit that's better avoided ...
Marking as "returned with feedback" based on those comments.
Thank you everybody for your comments.
I am indeed trying to avoid a new node since my intuitive feeling says that we do not need a new node for a functionality which is present in other commands albeit in different form. However, I agree that documentation explaining parse representation is lacking and shall improve that. Also, in accordance to Tom's comments, I shall look for not touching target list directly and follow the path of MultiAssignRef.
I have moved patch to current CF and marked it as Waiting on Author since I plan to resubmit after addressing the concerns.
Regards,
Atri
Atri
--
Regards,
Atri
l'apprenant
On Mon, Dec 15, 2014 at 7:50 PM, Atri Sharma <atri.jiit@gmail.com> wrote: > I have moved patch to current CF and marked it as Waiting on Author since I > plan to resubmit after addressing the concerns. Nothing happened in the last month, so marking as returned with feedback. -- Michael
Hi all,
Sorry for the delay.
Sorry for the delay.
Please find attached latest version of UPDATE (*) patch.The patch implements review comments and Tom's gripe about touching transformTargetList is addressed now. I have added regression tests and simplified parser representation a bit.
Regards,
Atri
Atri
Attachment
Atri Sharma <atri.jiit@gmail.com> writes: > Please find attached latest version of UPDATE (*) patch.The patch > implements review comments and Tom's gripe about touching > transformTargetList is addressed now. I have added regression tests and > simplified parser representation a bit. I spent a fair amount of time cleaning this patch up to get it into committable shape, but as I was working on the documentation I started to lose enthusiasm for it, because I was having a hard time coming up with compelling examples. The originally proposed motivation was >> It solves the problem of doing UPDATE from a record variable of the same >> type as the table e.g. update foo set (*) = (select foorec.*) where ...; but it feels to me that this is not actually a very good solution to that problem --- even granting that the problem needs to be solved, a claim that still requires some justification IMO. Here is a possible use-case: regression=# create table src (f1 int, f2 text, f3 real); CREATE TABLE regression=# create table dst (f1 int, f2 text, f3 real); CREATE TABLE regression=# create rule r1 as on update to src do instead regression-# update dst set (*) = (new.*) where dst.f1 = old.f1; ERROR: number of columns does not match number of values LINE 2: update dst set (*) = (new.*) where dst.f1 = old.f1; ^ So that's annoying. You can work around it with this very unobvious (and undocumented) syntax hack: regression=# create rule r1 as on update to src do instead regression-# update dst set (*) = (select new.*) where dst.f1 = old.f1; CREATE RULE But what happens if dst has no matching row? Your data goes into the bit bucket, that's what. What you really wanted here was some kind of UPSERT. There's certainly a possible use-case for a notation like this in UPSERT, when we get around to implementing that; but I'm less than convinced that we need it in plain UPDATE. What's much worse though is that the rule that actually gets stored is: regression=# \d+ src Table "public.src" Column | Type | Modifiers | Storage | Stats target | Description --------+---------+-----------+----------+--------------+------------- f1 | integer | | plain | | f2 | text | | extended | | f3 | real | | plain | | Rules: r1 AS ON UPDATE TO src DO INSTEAD UPDATE dst SET (f1, f2, f3) = ( SELECT new.f1, new.f2, new.f3) WHERE dst.f1 = old.f1 That is, you don't actually have any protection at all against future additions of columns, which seems like it would be most of the point of using a notation like this. We could possibly address that by postponing expansion of the *-notation to rule rewrite time, but that seems like it'd be a complete disaster in terms of modularity, or even usability (since column-mismatch errors wouldn't get raised till then either). And it'd certainly be a far more invasive and complex patch than this. So I'm feeling that this may not be a good idea, or at least not a good implementation of the idea. I'm inclined to reject the patch rather than lock us into something that is not standard and doesn't really do what people would be likely to want. Attached is the updated patch that I had before arriving at this discouraging conclusion. regards, tom lane diff --git a/doc/src/sgml/ref/update.sgml b/doc/src/sgml/ref/update.sgml index 35b0699..adb4473 100644 *** a/doc/src/sgml/ref/update.sgml --- b/doc/src/sgml/ref/update.sgml *************** PostgreSQL documentation *** 25,31 **** UPDATE [ ONLY ] <replaceable class="PARAMETER">table_name</replaceable> [ * ] [ [ AS ] <replaceable class="parameter">alias</replaceable>] SET { <replaceable class="PARAMETER">column_name</replaceable> = { <replaceable class="PARAMETER">expression</replaceable>| DEFAULT } | ( <replaceable class="PARAMETER">column_name</replaceable> [, ...] ) = ( { <replaceable class="PARAMETER">expression</replaceable>| DEFAULT } [, ...] ) | ! ( <replaceable class="PARAMETER">column_name</replaceable> [, ...] ) = ( <replaceable class="PARAMETER">sub-SELECT</replaceable>) } [, ...] [ FROM <replaceable class="PARAMETER">from_list</replaceable> ] [ WHERE <replaceable class="PARAMETER">condition</replaceable> | WHERE CURRENT OF <replaceable class="PARAMETER">cursor_name</replaceable>] --- 25,33 ---- UPDATE [ ONLY ] <replaceable class="PARAMETER">table_name</replaceable> [ * ] [ [ AS ] <replaceable class="parameter">alias</replaceable>] SET { <replaceable class="PARAMETER">column_name</replaceable> = { <replaceable class="PARAMETER">expression</replaceable>| DEFAULT } | ( <replaceable class="PARAMETER">column_name</replaceable> [, ...] ) = ( { <replaceable class="PARAMETER">expression</replaceable>| DEFAULT } [, ...] ) | ! ( <replaceable class="PARAMETER">column_name</replaceable> [, ...] ) = ( <replaceable class="PARAMETER">sub-SELECT</replaceable>) | ! (*) = ( { <replaceable class="PARAMETER">expression</replaceable> | DEFAULT } [, ...] ) | ! (*) = ( <replaceable class="PARAMETER">sub-SELECT</replaceable> ) } [, ...] [ FROM <replaceable class="PARAMETER">from_list</replaceable> ] [ WHERE <replaceable class="PARAMETER">condition</replaceable> | WHERE CURRENT OF <replaceable class="PARAMETER">cursor_name</replaceable>] *************** UPDATE [ ONLY ] <replaceable class="PARA *** 164,169 **** --- 166,181 ---- </varlistentry> <varlistentry> + <term><literal>(*)</literal></term> + <listitem> + <para> + In the <literal>SET</> clause, <literal>(*)</literal> stands for a + parenthesized column list of all the table's columns, in order. + </para> + </listitem> + </varlistentry> + + <varlistentry> <term><replaceable class="PARAMETER">from_list</replaceable></term> <listitem> <para> *************** UPDATE summary s SET (sum_x, sum_y, avg_ *** 374,379 **** --- 386,398 ---- </para> <para> + Update complete rows from another table having the same set of columns: + <programlisting> + UPDATE dest_table d SET (*) = (SELECT * FROM src_table s WHERE s.id = d.id); + </programlisting> + </para> + + <para> Attempt to insert a new stock item along with the quantity of stock. If the item already exists, instead update the stock count of the existing item. To do this without failing the entire transaction, use savepoints: *************** UPDATE films SET kind = 'Dramatic' WHERE *** 407,413 **** This command conforms to the <acronym>SQL</acronym> standard, except that the <literal>FROM</literal> and <literal>RETURNING</> clauses are <productname>PostgreSQL</productname> extensions, as is the ability ! to use <literal>WITH</> with <command>UPDATE</>. </para> <para> --- 426,434 ---- This command conforms to the <acronym>SQL</acronym> standard, except that the <literal>FROM</literal> and <literal>RETURNING</> clauses are <productname>PostgreSQL</productname> extensions, as is the ability ! to use <literal>WITH</> with <command>UPDATE</>. Also, the <literal>(*)</> ! syntax to specify all columns as a multiple-assignment target is a ! <productname>PostgreSQL</productname> extension. </para> <para> *************** UPDATE films SET kind = 'Dramatic' WHERE *** 419,427 **** </para> <para> ! According to the standard, the source value for a parenthesized sub-list of ! column names can be any row-valued expression yielding the correct number ! of columns. <productname>PostgreSQL</productname> only allows the source value to be a parenthesized list of expressions (a row constructor) or a sub-<literal>SELECT</>. An individual column's updated value can be specified as <literal>DEFAULT</> in the row-constructor case, but not --- 440,449 ---- </para> <para> ! According to the standard, the source value for a multiple-assignment ! target (that is, a parenthesized sub-list of column names) can be any ! row-valued expression yielding the correct number of ! columns. <productname>PostgreSQL</productname> only allows the source value to be a parenthesized list of expressions (a row constructor) or a sub-<literal>SELECT</>. An individual column's updated value can be specified as <literal>DEFAULT</> in the row-constructor case, but not diff --git a/src/backend/parser/analyze.c b/src/backend/parser/analyze.c index 4a5a520..901bd14 100644 *** a/src/backend/parser/analyze.c --- b/src/backend/parser/analyze.c *************** transformUpdateStmt(ParseState *pstate, *** 1899,1906 **** --- 1899,1908 ---- { Query *qry = makeNode(Query); ParseNamespaceItem *nsitem; + Relation target_rel; RangeTblEntry *target_rte; Node *qual; + List *targetList; ListCell *origTargetList; ListCell *tl; *************** transformUpdateStmt(ParseState *pstate, *** 1919,1924 **** --- 1921,1927 ---- interpretInhOption(stmt->relation->inhOpt), true, ACL_UPDATE); + target_rel = pstate->p_target_relation; /* grab the namespace item made by setTargetTable */ nsitem = (ParseNamespaceItem *) llast(pstate->p_namespace); *************** transformUpdateStmt(ParseState *pstate, *** 1937,1943 **** nsitem->p_lateral_only = false; nsitem->p_lateral_ok = true; ! qry->targetList = transformTargetList(pstate, stmt->targetList, EXPR_KIND_UPDATE_SOURCE); qual = transformWhereClause(pstate, stmt->whereClause, --- 1940,2025 ---- nsitem->p_lateral_only = false; nsitem->p_lateral_ok = true; ! /* ! * Before starting the main processing of the target list, we must expand ! * any "(*)" multicolumn target clauses, replacing them with the actual ! * column names of the result relation. Since most UPDATEs won't use that ! * feature, make a quick check to see if it's used and avoid expending the ! * cycles to rebuild the target list if not. ! */ ! targetList = stmt->targetList; ! ! foreach(tl, targetList) ! { ! ResTarget *res = (ResTarget *) lfirst(tl); ! ! if (res->name == NULL) ! break; ! } ! if (tl) ! { ! /* Yes, we have "(*)" ... expand the targetlist */ ! List *newTargetList = NIL; ! ! foreach(tl, targetList) ! { ! ResTarget *res = (ResTarget *) lfirst(tl); ! int i, ! j, ! ncolumns; ! ! /* If not "(*)", just keep it in the list */ ! if (res->name != NULL) ! { ! newTargetList = lappend(newTargetList, res); ! continue; ! } ! ! /* ! * Expand "(*)" into actual column names. First we must count the ! * number of non-dropped target columns. ! */ ! ncolumns = 0; ! for (i = 0; i < target_rel->rd_rel->relnatts; i++) ! { ! Form_pg_attribute att = target_rel->rd_att->attrs[i]; ! ! if (att->attisdropped) ! continue; ! ! ncolumns++; ! } ! j = 0; ! for (i = 0; i < target_rel->rd_rel->relnatts; i++) ! { ! Form_pg_attribute att = target_rel->rd_att->attrs[i]; ! MultiAssignRef *r; ! ResTarget *newres; ! ! if (att->attisdropped) ! continue; ! ! /* Make a MultiAssignRef for each column */ ! r = makeNode(MultiAssignRef); ! r->source = res->val; ! r->colno = ++j; ! r->ncolumns = ncolumns; ! ! /* ... and a ResTarget */ ! newres = makeNode(ResTarget); ! newres->name = pstrdup(NameStr(att->attname)); ! newres->indirection = NIL; ! newres->val = (Node *) r; ! newres->location = res->location; ! ! newTargetList = lappend(newTargetList, newres); ! } ! } ! targetList = newTargetList; ! } ! ! /* Now we can run the main processing of the target list */ ! qry->targetList = transformTargetList(pstate, targetList, EXPR_KIND_UPDATE_SOURCE); qual = transformWhereClause(pstate, stmt->whereClause, *************** transformUpdateStmt(ParseState *pstate, *** 1956,1967 **** */ /* Prepare to assign non-conflicting resnos to resjunk attributes */ ! if (pstate->p_next_resno <= pstate->p_target_relation->rd_rel->relnatts) ! pstate->p_next_resno = pstate->p_target_relation->rd_rel->relnatts + 1; /* Prepare non-junk columns for assignment to target table */ target_rte = pstate->p_target_rangetblentry; ! origTargetList = list_head(stmt->targetList); foreach(tl, qry->targetList) { --- 2038,2049 ---- */ /* Prepare to assign non-conflicting resnos to resjunk attributes */ ! if (pstate->p_next_resno <= target_rel->rd_rel->relnatts) ! pstate->p_next_resno = target_rel->rd_rel->relnatts + 1; /* Prepare non-junk columns for assignment to target table */ target_rte = pstate->p_target_rangetblentry; ! origTargetList = list_head(targetList); foreach(tl, qry->targetList) { *************** transformUpdateStmt(ParseState *pstate, *** 1986,1999 **** origTarget = (ResTarget *) lfirst(origTargetList); Assert(IsA(origTarget, ResTarget)); ! attrno = attnameAttNum(pstate->p_target_relation, ! origTarget->name, true); if (attrno == InvalidAttrNumber) ereport(ERROR, (errcode(ERRCODE_UNDEFINED_COLUMN), errmsg("column \"%s\" of relation \"%s\" does not exist", origTarget->name, ! RelationGetRelationName(pstate->p_target_relation)), parser_errposition(pstate, origTarget->location))); updateTargetListEntry(pstate, tle, origTarget->name, --- 2068,2080 ---- origTarget = (ResTarget *) lfirst(origTargetList); Assert(IsA(origTarget, ResTarget)); ! attrno = attnameAttNum(target_rel, origTarget->name, true); if (attrno == InvalidAttrNumber) ereport(ERROR, (errcode(ERRCODE_UNDEFINED_COLUMN), errmsg("column \"%s\" of relation \"%s\" does not exist", origTarget->name, ! RelationGetRelationName(target_rel)), parser_errposition(pstate, origTarget->location))); updateTargetListEntry(pstate, tle, origTarget->name, diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y index 88ec83c..56d2af5 100644 *** a/src/backend/parser/gram.y --- b/src/backend/parser/gram.y *************** single_set_clause: *** 9536,9561 **** multiple_set_clause: '(' set_target_list ')' '=' ctext_row { ListCell *col_cell; - ListCell *val_cell; /* ! * Break the ctext_row apart, merge individual expressions ! * into the destination ResTargets. This is semantically ! * equivalent to, and much cheaper to process than, the ! * general case. */ ! if (list_length($2) != list_length($5)) ! ereport(ERROR, ! (errcode(ERRCODE_SYNTAX_ERROR), ! errmsg("number of columns does not match number of values"), ! parser_errposition(@5))); ! forboth(col_cell, $2, val_cell, $5) { ResTarget *res_col = (ResTarget *) lfirst(col_cell); ! Node *res_val = (Node *) lfirst(val_cell); ! res_col->val = res_val; } $$ = $2; --- 9536,9562 ---- multiple_set_clause: '(' set_target_list ')' '=' ctext_row { + int ncolumns = list_length($2); + int i = 1; ListCell *col_cell; /* ! * We create a MultiAssignRef source for each target, all ! * of them pointing to the ctext_row List. The ctext_row ! * will get split apart during parse analysis. (We could ! * split it apart here, but it's simpler to share code ! * with the "(*)" target case this way.) */ ! foreach(col_cell, $2) { ResTarget *res_col = (ResTarget *) lfirst(col_cell); ! MultiAssignRef *r = makeNode(MultiAssignRef); ! r->source = (Node *) $5; ! r->colno = i; ! r->ncolumns = ncolumns; ! res_col->val = (Node *) r; ! i++; } $$ = $2; *************** multiple_set_clause: *** 9590,9595 **** --- 9591,9635 ---- $$ = $2; } + | '(' '*' ')' '=' ctext_row + { + ResTarget *res_col = makeNode(ResTarget); + + /* + * Create a ResTarget with null name to represent "(*)". + * We don't bother with MultiAssignRef yet. + */ + res_col->name = NULL; + res_col->indirection = NIL; + res_col->val = (Node *) $5; + res_col->location = @2; + + $$ = list_make1(res_col); + } + | '(' '*' ')' '=' select_with_parens + { + SubLink *sl = makeNode(SubLink); + ResTarget *res_col = makeNode(ResTarget); + + /* First, convert bare SelectStmt into a SubLink */ + sl->subLinkType = MULTIEXPR_SUBLINK; + sl->subLinkId = 0; /* will be assigned later */ + sl->testexpr = NULL; + sl->operName = NIL; + sl->subselect = $5; + sl->location = @5; + + /* + * Create a ResTarget with null name to represent "(*)". + * We don't bother with MultiAssignRef yet. + */ + res_col->name = NULL; + res_col->indirection = NIL; + res_col->val = (Node *) sl; + res_col->location = @2; + + $$ = list_make1(res_col); + } ; set_target: diff --git a/src/backend/parser/parse_expr.c b/src/backend/parser/parse_expr.c index f759606..c5ce95f 100644 *** a/src/backend/parser/parse_expr.c --- b/src/backend/parser/parse_expr.c *************** transformMultiAssignRef(ParseState *psta *** 1446,1451 **** --- 1446,1470 ---- /* We should only see this in first-stage processing of UPDATE tlists */ Assert(pstate->p_expr_kind == EXPR_KIND_UPDATE_SOURCE); + /* + * The grammar should only generate Lists or MULTIEXPR SubLinks as inputs + * to a MultiAssignRef. If it's a List, we need only pick out the n'th + * list element and transform that. + */ + if (IsA(maref->source, List)) + { + List *slist = (List *) maref->source; + Node *source; + + if (list_length(slist) != maref->ncolumns) + ereport(ERROR, + (errcode(ERRCODE_SYNTAX_ERROR), + errmsg("number of columns does not match number of values"), + parser_errposition(pstate, exprLocation(maref->source)))); + source = (Node *) list_nth(slist, maref->colno - 1); + return transformExprRecurse(pstate, source); + } + /* We only need to transform the source if this is the first column */ if (maref->colno == 1) { diff --git a/src/include/nodes/parsenodes.h b/src/include/nodes/parsenodes.h index 0e257ac..6456f02 100644 *** a/src/include/nodes/parsenodes.h --- b/src/include/nodes/parsenodes.h *************** typedef struct A_ArrayExpr *** 400,406 **** * * In an UPDATE target list, 'name' is the name of the destination column, * 'indirection' stores any subscripts attached to the destination, and ! * 'val' is the expression to assign. * * See A_Indirection for more info about what can appear in 'indirection'. */ --- 400,408 ---- * * In an UPDATE target list, 'name' is the name of the destination column, * 'indirection' stores any subscripts attached to the destination, and ! * 'val' is the expression to assign. If 'name' is NULL then the ResTarget ! * represents a "(*)" multi-column assignment target (and its 'val' field ! * is either a List of value expressions, or a MULTIEXPR SubLink). * * See A_Indirection for more info about what can appear in 'indirection'. */ diff --git a/src/test/regress/expected/update.out b/src/test/regress/expected/update.out index 1de2a86..554aa35 100644 *** a/src/test/regress/expected/update.out --- b/src/test/regress/expected/update.out *************** SELECT * FROM update_test; *** 129,134 **** --- 129,200 ---- | | (4 rows) + -- + -- Test (*) syntax + -- + INSERT INTO update_test VALUES (1001, 1002); + UPDATE update_test SET (*) = (SELECT a, b, 'foo' FROM update_test WHERE b = 101) + WHERE a = 1001; + SELECT * FROM update_test; + a | b | c + ----+-----+----- + 21 | 101 | + 41 | 12 | car + 42 | 12 | car + | | + 21 | 101 | foo + (5 rows) + + UPDATE update_test SET (*) = (DEFAULT, DEFAULT, 'zap') + WHERE c = 'foo'; + SELECT * FROM update_test; + a | b | c + ----+-----+----- + 21 | 101 | + 41 | 12 | car + 42 | 12 | car + | | + 10 | | zap + (5 rows) + + DELETE FROM update_test WHERE c = 'zap'; + -- error cases (too many/few columns) + UPDATE update_test SET (a, b, c) = (1, 2); + ERROR: number of columns does not match number of values + LINE 1: UPDATE update_test SET (a, b, c) = (1, 2); + ^ + UPDATE update_test SET (a, b, c) = (1, 2, 3, 4); + ERROR: number of columns does not match number of values + LINE 1: UPDATE update_test SET (a, b, c) = (1, 2, 3, 4); + ^ + UPDATE update_test SET (a, b, c) = (SELECT 1, 2 FROM update_test); + ERROR: number of columns does not match number of values + LINE 1: UPDATE update_test SET (a, b, c) = (SELECT 1, 2 FROM update_... + ^ + UPDATE update_test SET (a, b, c) = (SELECT 1, 2, 3, 4 FROM update_test); + ERROR: number of columns does not match number of values + LINE 1: UPDATE update_test SET (a, b, c) = (SELECT 1, 2, 3, 4 FROM u... + ^ + UPDATE update_test SET (*) = (1, 2); + ERROR: number of columns does not match number of values + LINE 1: UPDATE update_test SET (*) = (1, 2); + ^ + UPDATE update_test SET (*) = (1, 2, 3, 4); + ERROR: number of columns does not match number of values + LINE 1: UPDATE update_test SET (*) = (1, 2, 3, 4); + ^ + UPDATE update_test SET (*) = (SELECT 1, 2 FROM update_test); + ERROR: number of columns does not match number of values + LINE 1: UPDATE update_test SET (*) = (SELECT 1, 2 FROM update_test); + ^ + UPDATE update_test SET (*) = (SELECT 1, 2, 3, 4 FROM update_test); + ERROR: number of columns does not match number of values + LINE 1: UPDATE update_test SET (*) = (SELECT 1, 2, 3, 4 FROM update_... + ^ + -- this is a semantic error, but not a syntactic one: + UPDATE update_test SET (*) = (DEFAULT, DEFAULT, DEFAULT), c = 'zap' + WHERE c = 'foo'; + ERROR: multiple assignments to same column "c" -- if an alias for the target table is specified, don't allow references -- to the original table name UPDATE update_test AS t SET b = update_test.b + 10 WHERE t.a = 10; diff --git a/src/test/regress/sql/update.sql b/src/test/regress/sql/update.sql index e71128c..e78de79 100644 *** a/src/test/regress/sql/update.sql --- b/src/test/regress/sql/update.sql *************** UPDATE update_test SET (b,a) = (select a *** 66,71 **** --- 66,101 ---- WHERE a = 11; SELECT * FROM update_test; + -- + -- Test (*) syntax + -- + INSERT INTO update_test VALUES (1001, 1002); + + UPDATE update_test SET (*) = (SELECT a, b, 'foo' FROM update_test WHERE b = 101) + WHERE a = 1001; + + SELECT * FROM update_test; + + UPDATE update_test SET (*) = (DEFAULT, DEFAULT, 'zap') + WHERE c = 'foo'; + + SELECT * FROM update_test; + + DELETE FROM update_test WHERE c = 'zap'; + + -- error cases (too many/few columns) + UPDATE update_test SET (a, b, c) = (1, 2); + UPDATE update_test SET (a, b, c) = (1, 2, 3, 4); + UPDATE update_test SET (a, b, c) = (SELECT 1, 2 FROM update_test); + UPDATE update_test SET (a, b, c) = (SELECT 1, 2, 3, 4 FROM update_test); + UPDATE update_test SET (*) = (1, 2); + UPDATE update_test SET (*) = (1, 2, 3, 4); + UPDATE update_test SET (*) = (SELECT 1, 2 FROM update_test); + UPDATE update_test SET (*) = (SELECT 1, 2, 3, 4 FROM update_test); + -- this is a semantic error, but not a syntactic one: + UPDATE update_test SET (*) = (DEFAULT, DEFAULT, DEFAULT), c = 'zap' + WHERE c = 'foo'; + -- if an alias for the target table is specified, don't allow references -- to the original table name UPDATE update_test AS t SET b = update_test.b + 10 WHERE t.a = 10;
I wrote: > So I'm feeling that this may not be a good idea, or at least not a good > implementation of the idea. I'm inclined to reject the patch rather than > lock us into something that is not standard and doesn't really do what > people would be likely to want. BTW, a potentially workable fix to the problem of not wanting to lock down column lists in stored rules is to create a syntax that represents whole-row, record-oriented assignment directly. Then we need not be concerned with individual columns at parse time at all. So imagine something like this: UPDATE dst SET * = new WHERE ...; UPDATE dst SET * = (SELECT src FROM src WHERE ...); or if you needed to construct a row value at runtime you could write UPDATE dst SET * = ROW(x,y,z) WHERE ...; UPDATE dst SET * = (SELECT ROW(x,y,z) FROM src WHERE ...); The main bit of functionality that would be lost compared to the current patch is the ability to use DEFAULT for some of the row members. But I am not sure there is a compelling use-case for that: seems like if you have some DEFAULTs in there then it's unlikely that you don't know the column list accurately, so the existing (col1,col2,...) syntax will serve fine. This seems like it might not be unduly complex to implement, although it would have roughly nothing in common with the current patch. If we were to go in this direction, it would be nice to at the same time add a similar whole-record syntax for INSERT. I'm not sure exactly what that should look like though. Also, again, we ought to be paying attention to how this would match up with UPSERT syntax. regards, tom lane
Tom Lane wrote: > I spent a fair amount of time cleaning this patch up to get it into > committable shape, but as I was working on the documentation I started > to lose enthusiasm for it, because I was having a hard time coming up > with compelling examples. The originally proposed motivation was > > >> It solves the problem of doing UPDATE from a record variable of the same > >> type as the table e.g. update foo set (*) = (select foorec.*) where ...; > > but it feels to me that this is not actually a very good solution to that > problem --- even granting that the problem needs to be solved, a claim > that still requires some justification IMO. How about an UPDATE ran inside a plpgsql function, which is using a row variable of the table type? You could assign values to individual columns of q row variable, and run the multicolumn UPDATE last. -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Tue, Apr 7, 2015 at 11:19 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > If we were to go in this direction, it would be nice to at the same time > add a similar whole-record syntax for INSERT. I'm not sure exactly what > that should look like though. Also, again, we ought to be paying > attention to how this would match up with UPSERT syntax. I expressed concern about allowing this for UPSERT [1]. To be fair, VoltDB's new UPSERT statement allows something similar (or rather mandates it, since you cannot just update some columns), and that doesn't look wholly unreasonable. I still don't like the idea of supporting this, though. I'm not aware of any other system allowing something like this for either MERGE or a non-standard UPSERT. [1] http://www.postgresql.org/message-id/CAM3SWZT=VXBJ7QKAidAmYbU40aP10udSqOOqhViX3Ykj7WBv9A@mail.gmail.com -- Peter Geoghegan
On Tue, Apr 7, 2015 at 12:01 PM, Peter Geoghegan <pg@heroku.com> wrote: > I still don't like the idea of > supporting this, though. I'm not aware of any other system allowing > something like this for either MERGE or a non-standard UPSERT. That includes MySQL, BTW. Only their REPLACE statement (which is a disaster for various reasons) can do something like this. Their INSERT ... ON DUPLICATE KEY UPDATE statement (which is roughly comparable to the proposed UPSERT patch) cannot update the entire row using a terse expression that references a row excluded from insertion (following the implementation taking the UPDATE path). -- Peter Geoghegan
Peter Geoghegan <pg@heroku.com> writes: > On Tue, Apr 7, 2015 at 11:19 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> If we were to go in this direction, it would be nice to at the same time >> add a similar whole-record syntax for INSERT. I'm not sure exactly what >> that should look like though. Also, again, we ought to be paying >> attention to how this would match up with UPSERT syntax. > I expressed concern about allowing this for UPSERT [1]. Yeah, your analogy to "SELECT *" being considered dangerous in production is not without merit. However, to the extent that the syntax is used to assign from a composite variable of the same (or compatible) data type, it seems like it would be safe enough. IOW, I think that analogy holds for the syntax implemented by the current patch, but not what I suggested in my followup. regards, tom lane
On 4/7/15 2:00 PM, Alvaro Herrera wrote: > Tom Lane wrote: > >> I spent a fair amount of time cleaning this patch up to get it into >> committable shape, but as I was working on the documentation I started >> to lose enthusiasm for it, because I was having a hard time coming up >> with compelling examples. The originally proposed motivation was >> >>>> It solves the problem of doing UPDATE from a record variable of the same >>>> type as the table e.g. update foo set (*) = (select foorec.*) where ...; >> >> but it feels to me that this is not actually a very good solution to that >> problem --- even granting that the problem needs to be solved, a claim >> that still requires some justification IMO. > > How about an UPDATE ran inside a plpgsql function, which is using a row > variable of the table type? You could assign values to individual > columns of q row variable, and run the multicolumn UPDATE last. Along similar lines, I've often wanted something akin to *, but allowing finer control over what you got. Generally when I want this it's because I really do want everything (as in, don't want to re-code a bunch of stuff if a column is added), but perhaps not the surrogate key field. Or I want everything, but rename some field to something else. Basically, another way to do what Alvaro is suggesting (though, the ability to rename something is new...) If we had that ability I think UPDATE * would become a lot more useful. -- Jim Nasby, Data Architect, Blue Treble Consulting Data in Trouble? Get it in Treble! http://BlueTreble.com
>>>>> "Tom" == Tom Lane <tgl@sss.pgh.pa.us> writes: Tom> I spent a fair amount of time cleaning this patch up to get itTom> into committable shape, but as I was working on thedocumentationTom> I started to lose enthusiasm for it, because I was having a hardTom> time coming up with compellingexamples. The originally proposedTom> motivation was >>> It solves the problem of doing UPDATE from a record variable of the same>>> type as the table e.g. update foo set (*)= (select foorec.*) where ...; There are a number of motivating examples for this (which have nothing to do with rules; I doubt anyone cares much about that). The fundamental point is that currently, given a table "foo" and some column or variable of foo's rowtype, you can do this: insert into foo select foorec.* [from ...] but there is no comparable way to do an update without naming every column explicitly, the closest being update foo set (a,b,...) = (foorec.a, foorec.b, ...) where ... One example that comes up occasionally (and that I've had to do myself more than once) is this: given a table "foo" and another with identical schema "reference_foo", apply appropriate inserts, updates and deletes to table "foo" to make the content of the two tables identical. This can be done these days with wCTEs: with t_diff as (select o.id as o_id, n.id as n_id, o, n from foo o full outer join reference_foo n on (o.id=n.id) where (o.*) is distinct from (n.*)), ins as (insert into foo select (n).* from t_diff where o_id isnull), del as (delete from foo where id in (select o_id from t_diff where n_id is null)), upd as (update foo set (col1,col2,...) = ((n).col1,(n).col2,...) -- XXX from t_diff where foo.id = n_id and o_id= n_id) select count(*) filter (where o_id is null) as num_ins, count(*) filter (where o_id = n_id) as num_upd, count(*)filter (where n_id is null) as num_del from t_diff; (This would be preferred over simply replacing the table content if the table is large and the changes few, you want to audit the changes, you need to avoid interfering with concurrent selects, etc.) The update part of that would be much improved by simply being able to say "update all columns of foo with values from (n)". The exact syntax isn't a big deal - though since SET (cols...) = ... is in the spec, it seems reasonable to at least keep some kind of consistency with it. Other examples arise from things one might want to do in plpgsql; for example to update a record from an hstore or json value, one can use [json_]populate_record to construct a record variable, but then it's back to naming all the columns in order to actually perform the update statement. [My connection with this patch is only that I suggested it to Atri as a possible project for him to do, because I wanted the feature and knew others did also, and helped explain how the existing MultiAssign worked and some of the criticism. I did not contribute any code.] -- Andrew (irc:RhodiumToad)
Andrew Gierth <andrew@tao11.riddles.org.uk> writes: > "Tom" == Tom Lane <tgl@sss.pgh.pa.us> writes: > Tom> I spent a fair amount of time cleaning this patch up to get it > Tom> into committable shape, but as I was working on the documentation > Tom> I started to lose enthusiasm for it, because I was having a hard > Tom> time coming up with compelling examples. > One example that comes up occasionally (and that I've had to do myself > more than once) is this: given a table "foo" and another with identical > schema "reference_foo", apply appropriate inserts, updates and deletes > to table "foo" to make the content of the two tables identical. This can > be done these days with wCTEs: > with > t_diff as (select o.id as o_id, n.id as n_id, o, n > from foo o full outer join reference_foo n on (o.id=n.id) > where (o.*) is distinct from (n.*)), > ins as (insert into foo select (n).* from t_diff where o_id is null), > del as (delete from foo > where id in (select o_id from t_diff where n_id is null)), > upd as (update foo > set (col1,col2,...) = ((n).col1,(n).col2,...) -- XXX > from t_diff > where foo.id = n_id and o_id = n_id) > select count(*) filter (where o_id is null) as num_ins, > count(*) filter (where o_id = n_id) as num_upd, > count(*) filter (where n_id is null) as num_del > from t_diff; While I agree that the UPDATE part of that desperately needs improvement, I don't agree that the INSERT part is entirely fine. You're still relying on a parse-time expansion of the (n).* notation, which is inefficient and not at all robust against schema changes (the same problem as with the patch's approach to UPDATE). So if we're taking this as a motivating example, I'd want to see a fix that allows both INSERT and UPDATE directly from a composite value of proper rowtype, without any expansion to individual columns at all. Perhaps we could adopt some syntax likeINSERT INTO table (*) values-or-select to represent the case that the values-or-select delivers a single composite column of the appropriate type. > Other examples arise from things one might want to do in plpgsql; for > example to update a record from an hstore or json value, one can use > [json_]populate_record to construct a record variable, but then it's > back to naming all the columns in order to actually perform the update > statement. Sure, but the patch as given didn't work very well for that either, at least not if you wanted to avoid multiple evaluation of the composite-returning function. You'd have to adopt some obscure syntax like "UPDATE target SET (*) = (SELECT * FROM composite_function(...))". With what I'm thinking about now you could doUPDATE target SET * = composite_function(...) which is a good deal less notation, and with a bit of luck it would not require disassembling and reassembling the function's output tuple. regards, tom lane
>>>>> "Tom" == Tom Lane <tgl@sss.pgh.pa.us> writes: >> One example that comes up occasionally (and that I've had to do>> myself more than once) is this: given a table "foo"and another with>> identical schema "reference_foo", apply appropriate inserts, updates>> and deletes to table "foo"to make the content of the two tables>> identical. This can be done these days with wCTEs: >> with>> t_diff as (select o.id as o_id, n.id as n_id, o, n>> from foo o full outer join reference_foon on (o.id=n.id)>> where (o.*) is distinct from (n.*)),>> ins as (insert into foo select (n).*from t_diff where o_id is null),>> del as (delete from foo>> where id in (select o_id from t_diff wheren_id is null)),>> upd as (update foo>> set (col1,col2,...) = ((n).col1,(n).col2,...) -- XXX>> from t_diff>> where foo.id = n_id and o_id = n_id)>> select count(*) filter (where o_id is null) as num_ins,>> count(*) filter (where o_id = n_id) as num_upd,>> count(*) filter (where n_id is null) as num_del>> from t_diff; Tom> While I agree that the UPDATE part of that desperately needsTom> improvement, I don't agree that the INSERT part isentirely fine.Tom> You're still relying on a parse-time expansion of the (n).*Tom> notation, which is inefficient Not in my experience a huge deal given what else is going on... Tom> and not at all robust against schema changes (the same problem asTom> with the patch's approach to UPDATE). Now this I think is wrong; I think it's just as robust against schema changes as using the composite value directly would be. Consider the case where foo and reference_foo match with the exception of dropped columns; the code as it is should just work, while a variant that used the composite values would have to explicitly deal with that. (When I've used this kind of operation in practice, reference_foo has just been created using CREATE TEMP TABLE reference_foo (LIKE foo); and then populated via COPY from an external data source. Even if reference_foo were a non-temp table, the logic of the situation requires it to have the same schema as foo as far as SQL statements are concerned.) Tom> So if we're taking this as a motivating example, I'd want to see aTom> fix that allows both INSERT and UPDATE directlyfrom a compositeTom> value of proper rowtype, without any expansion to individualTom> columns at all. I would argue that this is a case of the perfect being the enemy of the good. Tom> Perhaps we could adopt some syntax likeTom> INSERT INTO table (*) values-or-selectTom> to represent the case thatthe values-or-select delivers a singleTom> composite column of the appropriate type. We could, but I think in all practical cases it'll be nothing more than a small performance optimization rather than something that really benefits people in terms of enhanced functionality. >> Other examples arise from things one might want to do in plpgsql; for>> example to update a record from an hstore or jsonvalue, one can use>> [json_]populate_record to construct a record variable, but then it's>> back to naming all the columnsin order to actually perform the update>> statement. Tom> Sure, but the patch as given didn't work very well for thatTom> either, Partly that's a limitation resulting from how much can be done with the existing SET (...) = syntax and implementation without ripping it all out and starting over. An incremental improvement seemed to be a more readily achievable goal. In practice it would indeed probably look like: declare new_id integer; new_values hstore; begin /* do stuff */ update foo set (*) = (select * from populate_record(foo,new_values)) where foo.id = new_id; A agree that it would be nicer to do update foo set (*) = populate_record(foo, new_values) where foo.id = new_id; but that would be a substantially larger project. The alternative of set * = populate_record(foo, new_values) is less satisfactory since it introduces inconsistencies with the case where you _do_ want to specify explicit columns, unless you also allow set (a,b) = row_value which is required by the spec for T641 but which we don't currently have. -- Andrew (irc:RhodiumToad)
Andrew Gierth <andrew@tao11.riddles.org.uk> writes: > "Tom" == Tom Lane <tgl@sss.pgh.pa.us> writes: > Tom> and not at all robust against schema changes (the same problem as > Tom> with the patch's approach to UPDATE). > Now this I think is wrong; I think it's just as robust against schema > changes as using the composite value directly would be. Consider the > case where foo and reference_foo match with the exception of dropped > columns; the code as it is should just work, while a variant that used > the composite values would have to explicitly deal with that. AFAICS you've got that backwards. As I illustrated upthread, after parse-time expansion we would have a command that explicitly tells the system to insert or update only the enumerated columns. That will *not* work as desired if columns are added later, and (if it's in a rule) I would expect the system to have to fail a DROP command trying to drop one of those named columns, the same way you can't drop a column referenced in a view/rule today. On the other hand, if it's a composite-type assignment, then dealing with things like dropped columns becomes the system's responsibility, and we already have code for that sort of thing. > ... The alternative of > set * = populate_record(foo, new_values) > is less satisfactory since it introduces inconsistencies with the case > where you _do_ want to specify explicit columns, unless you also allow > set (a,b) = row_value > which is required by the spec for T641 but which we don't currently > have. Right, but we should be trying to move in that direction. I see your point though that (*) is more notationally consistent with that case. Maybe we should be looking at trying to implement T641 in full and then including (*) as a special case of that. Anyway, my core point here is that we should avoid parse-time expansion of the column set. The *only* reason for doing that is that it makes the patch simpler, not that there is some functional advantage; in fact there are considerable functional disadvantages as we've just been through. So I do not want to commit a patch that institutionalizes those disadvantages just for short-term implementation simplicity. regards, tom lane
>>>>> "Tom" == Tom Lane <tgl@sss.pgh.pa.us> writes: >> Now this I think is wrong; I think it's just as robust against>> schema changes as using the composite value directlywould>> be. Consider the case where foo and reference_foo match with the>> exception of dropped columns; the codeas it is should just work,>> while a variant that used the composite values would have to>> explicitly deal with that. Tom> AFAICS you've got that backwards. Tom> As I illustrated upthread, after parse-time expansion we wouldTom> have a command that explicitly tells the system toinsert orTom> update only the enumerated columns. That will *not* work asTom> desired if columns are added later, Where "later" is between parse analysis and execution - and if this query is not in a rule, then any such schema change will force a re-analysis if it's a prepared statement, no? and otherwise, we have the tables locked against schema changes anyway? Is there a failure case here that doesn't involve rules? Tom> and (if it's in a rule) well, the example I gave is not something that anyone in their right minds would try and put in a rule. >> ... The alternative of>> set * = populate_record(foo, new_values)>> is less satisfactory since it introduces inconsistencieswith the case>> where you _do_ want to specify explicit columns, unless you also allow>> set (a,b) = row_value>>which is required by the spec for T641 but which we don't currently>> have. Tom> Right, but we should be trying to move in that direction. I seeTom> your point though that (*) is more notationallyconsistent withTom> that case. Maybe we should be looking at trying to implement T641Tom> in full and then including(*) as a special case of that. I would have liked to have done that, but that would have raised the complexity of the project from "Atri can take a stab at this one with negligible supervision" to "Andrew will have to spend more time than he has conveniently available staring at the raw parser to try and make it work". As I said, the perfect is the enemy of the good. Tom> Anyway, my core point here is that we should avoid parse-timeTom> expansion of the column set. Parse-time expansion of * is pretty widespread elsewhere. Changing that for this one specific case seems a bit marginal to me - and if the main motivation to do so is to support cases in DML rules, which are already a foot-bazooka, I think it's honestly not worth it. -- Andrew (irc:RhodiumToad)
Andrew Gierth wrote: > >>>>> "Tom" == Tom Lane <tgl@sss.pgh.pa.us> writes: > Tom> Right, but we should be trying to move in that direction. I see > Tom> your point though that (*) is more notationally consistent with > Tom> that case. Maybe we should be looking at trying to implement T641 > Tom> in full and then including (*) as a special case of that. > > I would have liked to have done that, but that would have raised the > complexity of the project from "Atri can take a stab at this one with > negligible supervision" to "Andrew will have to spend more time than he > has conveniently available staring at the raw parser to try and make it > work". Not to mention that, at this stage, we should be looking at reducing the scope of patches in commitfest rather than enlarge it. -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Alvaro Herrera <alvherre@2ndquadrant.com> writes: > Andrew Gierth wrote: > "Tom" == Tom Lane <tgl@sss.pgh.pa.us> writes: >>> Tom> Right, but we should be trying to move in that direction. I see >>> Tom> your point though that (*) is more notationally consistent with >>> Tom> that case. Maybe we should be looking at trying to implement T641 >>> Tom> in full and then including (*) as a special case of that. >> I would have liked to have done that, but that would have raised the >> complexity of the project from "Atri can take a stab at this one with >> negligible supervision" to "Andrew will have to spend more time than he >> has conveniently available staring at the raw parser to try and make it >> work". Well, we've never considered implementation convenience to be more important than getting it right, and this doesn't seem like a place to start. (FWIW, the raw-parser changes would be a negligible fraction of the work involved to do it like that.) > Not to mention that, at this stage, we should be looking at reducing the > scope of patches in commitfest rather than enlarge it. I already took it out of the current commitfest ;-). regards, tom lane
>>>>> "Tom" == Tom Lane <tgl@sss.pgh.pa.us> writes: Tom> Well, we've never considered implementation convenience to be moreTom> important than getting it right, and this doesn'tseem like aTom> place to start. I quote your own words from less than a year ago: The standard actually says that the source for a multi-column assignment could be any row-valued expression. The implementationused here is tightly tied to our existing sub-SELECT support and can't handle other cases; the Bison grammarwould have some issues with them too. However, I don't feel too bad about this since other cases can be convertedinto sub-SELECTs. For instance, "SET (a,b,c) = row_valued_function(x)" could be written "SET (a,b,c) = (SELECT* FROM row_valued_function(x))". (from the commit message for 8f889b108) Tom> (FWIW, the raw-parser changes would be a negligible fraction ofTom> the work involved to do it like that.) /** Ideally, we'd accept any row-valued a_expr as RHS of a multiple_set_clause.* However, per SQL spec the row-constructorcase must allow DEFAULT as a row* member, and it's pretty unclear how to do that (unless perhaps we allow*DEFAULT in any a_expr and let parse analysis sort it out later?). For the* moment, the planner/executor only supporta subquery as a multiassignment* source anyhow, so we need only accept ctext_row and subqueries here.*/ (The "punt to parse analysis" solution looks reasonable to me, but I'm just as in the dark as you were when you wrote that as to any other possible solution.) -- Andrew (irc:RhodiumToad)