Thread: doc: Clarify what "excluded" represents for INSERT ON CONFLICT
Hi,
Reposting this on its own thread.
As one cannot place excluded in a FROM clause (subquery) in the
ON CONFLICT clause referring to it as a table, with plural rows
nonetheless, leads the reader to infer more about what the
behavior here is than is correct. We already just say use the
table's name for the existing row so just match that pattern
of using the name excluded for the proposed row.
The alias description doesn't have the same issue regarding the
use of the word table and rows, as the use there is more conceptual,
but the wording about "otherwise taken as" is wrong: rather two
labels of excluded end up in scope and you get an ambiguous name error.
The error messages still consider excluded to be a table reference
and this patch does not try to change that. That implementation
detail need not force the user-facing documentation for the feature
to use the term table when it doesn't really apply.
ON CONFLICT clause referring to it as a table, with plural rows
nonetheless, leads the reader to infer more about what the
behavior here is than is correct. We already just say use the
table's name for the existing row so just match that pattern
of using the name excluded for the proposed row.
The alias description doesn't have the same issue regarding the
use of the word table and rows, as the use there is more conceptual,
but the wording about "otherwise taken as" is wrong: rather two
labels of excluded end up in scope and you get an ambiguous name error.
The error messages still consider excluded to be a table reference
and this patch does not try to change that. That implementation
detail need not force the user-facing documentation for the feature
to use the term table when it doesn't really apply.
David J.
Attachment
On Thu, Jun 9, 2022 at 11:40 AM David G. Johnston <david.g.johnston@gmail.com> wrote: > As one cannot place excluded in a FROM clause (subquery) in the > ON CONFLICT clause referring to it as a table, ... Well, it would be nice if you had included a test case rather than leaving it to the reviewer or committer to construct one. In general, dropping subtle patches with minimal commentary isn't really very helpful. But I decided to dig in and see what I could figure out. I constructed this test case first, which does work: rhaas=# create table foo (a int primary key, b text); CREATE TABLE rhaas=# insert into foo values (1, 'blarg'); INSERT 0 1 rhaas=# insert into foo values (1, 'frob') on conflict (a) do update set b = (select excluded.b || 'nitz'); INSERT 0 1 rhaas=# select * from foo; a | b ---+---------- 1 | frobnitz (1 row) Initially I thought that was the case you were talking about, but after staring at your email for another 20 minutes, I figured out that you're probably talking about something more like this, which doesn't work: rhaas=# insert into foo values (1, 'frob') on conflict (a) do update set b = (select b || 'nitz' from excluded); ERROR: relation "excluded" does not exist LINE 1: ...ct (a) do update set b = (select b || 'nitz' from excluded); I do find that a bit of a curious error message, because that relation clearly DOES exist in the range table. I know that because, if I use a wrong column name, I get a complaint about the column not existing, not the relation not existing: rhaas=# insert into foo values (1, 'frob') on conflict (a) do update set b = (select excluded.bbbbbbbbb || 'nitz'); ERROR: column excluded.bbbbbbbbb does not exist LINE 1: ...'frob') on conflict (a) do update set b = (select excluded.b... That said, I am not convinced that changing the documentation in this way is a good idea. It is clear that, at the level of the code, "excluded" behaves like a pseudo-table, and the fact that it isn't equivalent to a real table in all ways, or that it can't be referenced at every point in the query equally, doesn't change that. I don't think that the language you're proposing is horrible or anything -- the distinction between a special table and a special name that behaves somewhat like a single-row table is subtle at best -- but I think that the threshold to commit a patch like this is that the change has to be a clear improvement, and I don't think it is. I think it might be fruitful to consider whether some of the error messages here could be improved or even whether some of the non-working cases could be made to work, but I'm just not really seeing the value of tinkering with documentation which is, in my view, not wrong. -- Robert Haas EDB: http://www.enterprisedb.com
On Thu, Jun 30, 2022 at 1:43 PM Robert Haas <robertmhaas@gmail.com> wrote: > rhaas=# insert into foo values (1, 'frob') on conflict (a) do update > set b = (select b || 'nitz' from excluded); > ERROR: relation "excluded" does not exist > LINE 1: ...ct (a) do update set b = (select b || 'nitz' from excluded); > > I do find that a bit of a curious error message, because that relation > clearly DOES exist in the range table. Let's say that we supported this syntax. I see some problems with that (you may have thought of these already). Thinking of "excluded" as a separate table creates some very thorny questions, such as: How would the user be expected to handle the case where there was more than a single "row" in "excluded"? How could the implementation know what the contents of the "excluded table" were ahead of time in the multi-row-insert case? We'd have to know about *all* of the conflicts for all rows proposed for insertion to do this, which seems impossible in the general case -- because some of those conflicts won't have happened yet, and cannot be predicted. The "excluded" pseudo-table is conceptually similar to an from_item alias used within an UPDATE .... FROM ... where the target table is also the from_item table (i.e. there is a self-join). There is only one table involved. -- Peter Geoghegan
On Thu, Jun 30, 2022 at 1:43 PM Robert Haas <robertmhaas@gmail.com> wrote:
On Thu, Jun 9, 2022 at 11:40 AM David G. Johnston
<david.g.johnston@gmail.com> wrote:
> As one cannot place excluded in a FROM clause (subquery) in the
> ON CONFLICT clause referring to it as a table, ...
Well, it would be nice if you had included a test case rather than
leaving it to the reviewer or committer to construct one. In general,
dropping subtle patches with minimal commentary isn't really very
helpful.
Fair point.
But I decided to dig in and see what I could figure out. I constructed
this test case first, which does work:
rhaas=# create table foo (a int primary key, b text);
CREATE TABLE
rhaas=# insert into foo values (1, 'blarg');
INSERT 0 1
rhaas=# insert into foo values (1, 'frob') on conflict (a) do update
set b = (select excluded.b || 'nitz');
INSERT 0 1
rhaas=# select * from foo;
a | b
---+----------
1 | frobnitz
(1 row)
Initially I thought that was the case you were talking about, but
after staring at your email for another 20 minutes, I figured out that
you're probably talking about something more like this, which doesn't
work:
rhaas=# insert into foo values (1, 'frob') on conflict (a) do update
set b = (select b || 'nitz' from excluded);
ERROR: relation "excluded" does not exist
LINE 1: ...ct (a) do update set b = (select b || 'nitz' from excluded);
Right, the word "excluded" appearing immediately after the word FROM is what I meant by:
"As one cannot place excluded in a FROM clause (subquery) in the
ON CONFLICT clause"It is clear that, at the level of the code,
"excluded" behaves like a pseudo-table,
And people in the code are capable of understanding this without difficulty no matter how we write it. They are not the target audience.
but I
think that the threshold to commit a patch like this is that the
change has to be a clear improvement, and I don't think it is.
I'm hoping for "more clear and accurate without making things worse"...
The fact that it does not and cannot use FROM and that it never refers to more than a single row (which is what motivated the change in the first place) for me make using the word table here more trouble than it is worth.
I think it might be fruitful to consider whether some of the error
messages here could be improved
Possibly...
or even whether some of the
non-working cases could be made to work,
That would, IMO, make things worse. "excluded" isn't a table in that sense, anymore than "NEW" and "OLD" in the context of triggers.
but I'm just not really
seeing the value of tinkering with documentation which is, in my view,
not wrong.
Current:
"The SET and WHERE clauses in ON CONFLICT DO UPDATE have access to the
existing row using the table's name (or an alias), and to [rows] proposed
for insertion using the special excluded table."
"The SET and WHERE clauses in ON CONFLICT DO UPDATE have access to the
existing row using the table's name (or an alias), and to [rows] proposed
for insertion using the special excluded table."
The word table in that sentence is wrong and not a useful way to think of the thing which we've named excluded. It is a single value of a composite type having the structure of the named table.
I'll agree that most people will mentally paper over the difference and go merrily on their way. At least one person recently did not do that, which prompted an email to the community, which prompted a response and this suggestion to avoid that in the future while, IMO, not making understanding of the concept any less clear.
David J.
On Thu, Jun 30, 2022 at 2:07 PM David G. Johnston <david.g.johnston@gmail.com> wrote: > Current: > "The SET and WHERE clauses in ON CONFLICT DO UPDATE have access to the > existing row using the table's name (or an alias), and to [rows] proposed > for insertion using the special excluded table." > > The word table in that sentence is wrong and not a useful way to think of the thing which we've named excluded. It isa single value of a composite type having the structure of the named table. I think that your reasoning is correct, but I don't agree with your conclusion. The term "special excluded table" is a fudge, but that isn't necessarily a bad thing. Sure, we could add something about the UPDATE being similar to an UPDATE with a self-join, as I said upthread. But I think that that would make the concept harder to grasp. > I'll agree that most people will mentally paper over the difference and go merrily on their way. At least one person recentlydid not do that, which prompted an email to the community Can you provide a reference for this? Didn't see anything like that in the reference you gave upthread. I have a hard time imagining a user that reads the INSERT docs and imagines that "excluded" is a relation that is visible to the query in ways that are not limited to expression evaluation for the UPDATE's WHERE/SET. The way that it works (and doesn't work) follows naturally from what a user would want to do in order to upsert. MySQL's INSERT ... ON DUPLICATE KEY UPDATE feature has a magical UPSERT-only expression instead of "excluded". -- Peter Geoghegan
On Thu, Jun 30, 2022 at 2:31 PM Peter Geoghegan <pg@bowt.ie> wrote:
On Thu, Jun 30, 2022 at 2:07 PM David G. Johnston
<david.g.johnston@gmail.com> wrote:
> Current:
> "The SET and WHERE clauses in ON CONFLICT DO UPDATE have access to the
> existing row using the table's name (or an alias), and to [rows] proposed
> for insertion using the special excluded table."
>
> The word table in that sentence is wrong and not a useful way to think of the thing which we've named excluded. It is a single value of a composite type having the structure of the named table.
I think that your reasoning is correct, but I don't agree with your
conclusion. The term "special excluded table" is a fudge, but that
isn't necessarily a bad thing. Sure, we could add something about the
UPDATE being similar to an UPDATE with a self-join, as I said
upthread. But I think that that would make the concept harder to
grasp.
I don't think incorporating self-joining to be helpful; the status quo is better than that. I believe people mostly think of "composite variable" from the current description even if we don't use those words - or such a concept can be explained by analogy with NEW and OLD (I think of it like a trigger, only that SQL doesn't have variables so we cannot use that term, hence just using "name").
> I'll agree that most people will mentally paper over the difference and go merrily on their way. At least one person recently did not do that, which prompted an email to the community
Can you provide a reference for this? Didn't see anything like that in
the reference you gave upthread.
OK, the discussion I am recalling happened on Discord hence the lack of a link.
On roughly 3/8 the following conversation occurred (I've trimmed out some non-relevant comments):
>>>OP
Hello, I have a simple question.
My table has a column 'transaction_date'I have an insert statement with an ON CONFLICT statement
I update using the 'excluded' values, but I only want to update if the transaction date is the same or newer.
Do I just use: "WHERE EXCLUDED.transaction_date >= transaction_date"?
so the full query is something like: INSERT INTO table VALUES (pk, yadda1, yadda2) ON CONFLICT (pk) DO UPDATE SET (yadda1 = EXCLUDED.yadda1, yadda2 = EXCLUDED.yadda2) WHERE EXCLUDED.transaction_date >= transaction_date;
>>>Other Person
I mean, the ... like 3 examples imply what it contains, and it vaguely says "and to rows proposed for insertion using the special excluded table." but...
Still, based on the BNF, that should work as you stated it.>>>OP
would perhaps it try to overwrite more than one row because many rows would meet the criteria?
It seems like it limits it to the conflict row but..>>>Other Person
Well, you're only conflicting on the PK, which is guaranteed to be unique.
>>>OP
>>>OP
Ah, so then it is limited to that row if it is specified within the ON CONFLICT action if I am reading correct.
[...]
If it matters to you, the only thing I got wrong apparently (in my limited non-sufficient testing) is that to access the current value within the table row you must use the table name. So: WHERE EXCLUDED.transaction_date >= tableName.transaction_date
>>>ME
"have access [...] to rows proposed for insertion using the special excluded table.". You have an update situation where two tables (the target and "excluded") are in scope with the exact same column names (by definition) so any column references in the value expressions need to be prefixed with which of the two tables you want to examine. As with a normal UPDATE, the left side of the SET clause entry must reference the target table and so its column cannot, and must not, be table qualified.
While it speaks of "rows" this is basically a per-row thing. As each row is tested and finds a conflict the update is executed."have access [...] to rows proposed for insertion using the special excluded table.". You have an update situation where two tables (the target and "excluded") are in scope with the exact same column names (by definition) so any column references in the value expressions need to be prefixed with which of the two tables you want to examine. As with a normal UPDATE, the left side of the SET clause entry must reference the target table and so its column cannot, and must not, be table qualified.
>>>Other Person
Mentioning something as critical as that offhand is a mistake IMO. It should have its own section.
It's also not mentioned in the BNF, though it shows up in the examples. You have to basically infer everything.
It's also not mentioned in the BNF, though it shows up in the examples. You have to basically infer everything.
>>>ME
The exact wording of the conflict_action description in head is: "The SET and WHERE clauses in ON CONFLICT DO UPDATE have access to the existing row using the table's name (or an alias), and to rows proposed for insertion using the special excluded table." I haven't read anything here that gives me a hint as to how that ended up misinterpreted so that I could possibly formulate an alternative wording. And I cannot think of a more appropriate place to locate that sentence either. The examples do cover this and the specifics here are not something that we try to represent in BNF.
I'd probably change "and to rows proposed for insertion" to "and to the corresponding row proposed for insertion".
I'd probably change "and to rows proposed for insertion" to "and to the corresponding row proposed for insertion".
>>>OP
This does not change the original conclusion we arrived at correct? If I am reading what you are saying right, since it only discovered the conflict after examining the row, then by the same token it will only affect the same row where the conflict was detected.
I have a hard time imagining a user that reads the INSERT docs and
imagines that "excluded" is a relation that is visible to the query in
ways that are not limited to expression evaluation for the UPDATE's
WHERE/SET.
Yes, and based on a single encounter I agree this doesn't seem like a broadly encountered issue. My takeaway from that eventually led to this proposal. The "Other Person" who is complaining about the docs is one of the mentors on the Discord server and works for one of the corporate contributors to the community. (I suppose Discord is considered public so maybe this redaction is unnecessary...)
David J.
On Thu, Jun 30, 2022 at 3:07 PM David G. Johnston <david.g.johnston@gmail.com> wrote: > Yes, and based on a single encounter I agree this doesn't seem like a broadly encountered issue. My takeaway from thateventually led to this proposal. The "Other Person" who is complaining about the docs is one of the mentors on the Discordserver and works for one of the corporate contributors to the community. (I suppose Discord is considered public somaybe this redaction is unnecessary...) My impression from reading this transcript is that the user was confused as to why they needed to qualify the target table name in the ON CONFLICT DO UPDATE's WHERE clause -- they didn't have to qualify it in the targetlist that appears in "SET ... ", so why the need to do it in the WHERE clause? This isn't something that upsert statements need to do all that often, just because adding additional conditions to the WHERE clause isn't usually necessary. That much makes sense to me -- I *can* imagine how that could cause confusion. If that interpretation is correct, then it's not clear what it should mean for how the INSERT documentation describes EXCLUDED. EXCLUDED is involved here, since EXCLUDED is the thing that creates the ambiguity, but that seems almost incidental to me. This user would probably not have been confused if they didn't need to use a WHERE clause (very much the common case), even when expression evaluation involving EXCLUDED in the SET was still used (also common). -- Peter Geoghegan
On Thu, Jun 30, 2022 at 5:05 PM Peter Geoghegan <pg@bowt.ie> wrote: > On Thu, Jun 30, 2022 at 1:43 PM Robert Haas <robertmhaas@gmail.com> wrote: > > rhaas=# insert into foo values (1, 'frob') on conflict (a) do update > > set b = (select b || 'nitz' from excluded); > > ERROR: relation "excluded" does not exist > > LINE 1: ...ct (a) do update set b = (select b || 'nitz' from excluded); > > > > I do find that a bit of a curious error message, because that relation > > clearly DOES exist in the range table. > > Let's say that we supported this syntax. I see some problems with that > (you may have thought of these already). Thinking of "excluded" as a > separate table creates some very thorny questions, such as: > > How would the user be expected to handle the case where there was more > than a single "row" in "excluded"? How could the implementation know > what the contents of the "excluded table" were ahead of time in the > multi-row-insert case? We'd have to know about *all* of the conflicts > for all rows proposed for insertion to do this, which seems impossible > in the general case -- because some of those conflicts won't have > happened yet, and cannot be predicted. I was assuming it would just behave like a 1-row table i.e. these would do the same thing: insert into foo values (1, 'frob') on conflict (a) do update set b = (select excluded.b || 'nitz'); insert into foo values (1, 'frob') on conflict (a) do update set b = (select b || 'nitz' from excluded); I'm actually kinda unsure why that doesn't already work. There may well be a very good reason, but my naive thought would be that if excluded doesn't have a range table entry, the first one would fail because excluded can't be looked up in the range table, and if it does have a range-table entry, then the second one would work because the from-clause reference would find it just like the qualified column reference did. -- Robert Haas EDB: http://www.enterprisedb.com
On Thu, Jun 30, 2022 at 6:40 PM Peter Geoghegan <pg@bowt.ie> wrote: > My impression from reading this transcript is that the user was > confused as to why they needed to qualify the target table name in the > ON CONFLICT DO UPDATE's WHERE clause -- they didn't have to qualify it > in the targetlist that appears in "SET ... ", so why the need to do it > in the WHERE clause? This isn't something that upsert statements need > to do all that often, just because adding additional conditions to the > WHERE clause isn't usually necessary. That much makes sense to me -- I > *can* imagine how that could cause confusion. +1. I think that the issue here is simply that because both the updated table and the "excluded" pseudo-table are visible here, and have the same columns, an unqualified name is ambiguous. I don't really think that it's worth documenting. The error message you get if you fail to do it is actually pretty good: rhaas=# insert into foo values (1, 'frob') on conflict (a) do update set b = (select b || 'nitz'); ERROR: column reference "b" is ambiguous LINE 1: ...'frob') on conflict (a) do update set b = (select b || 'nitz... ^ Now you could read that and not understand that the ambiguity is between the target table and the "excluded" pseudo-table, for sure. But, would you think to check the documentation at that point? I'm not sure that's what people would really do. And if they did, I think that David's proposed patch would be unlikely to make them less confused. What would probably help more is adding something like this to the error message: HINT: column "b" could refer to any of these relations: "foo", "excluded" That could also help people who encounter this error in other situations. I'm not 100% sure this is a good idea, but I feel like it would have a much better chance of helping someone in this situation than the proposed doc patch. -- Robert Haas EDB: http://www.enterprisedb.com
Robert Haas <robertmhaas@gmail.com> writes: > I think that the issue here is simply that because both the updated > table and the "excluded" pseudo-table are visible here, and have the > same columns, an unqualified name is ambiguous. I don't really think > that it's worth documenting. The error message you get if you fail to > do it is actually pretty good: > ERROR: column reference "b" is ambiguous > Now you could read that and not understand that the ambiguity is > between the target table and the "excluded" pseudo-table, for sure. Agreed. It doesn't help that there's no explicit use of "excluded" anywhere, as there is in more usual ambiguous-column cases. > What would probably help more is adding something like this to the > error message: > HINT: column "b" could refer to any of these relations: "foo", "excluded" +1, that seems like it could be handy across the board. regards, tom lane
On Fri, Jul 1, 2022 at 6:40 AM Tom Lane <tgl@sss.pgh.pa.us> wrote: > Robert Haas <robertmhaas@gmail.com> writes: > > What would probably help more is adding something like this to the > > error message: > > HINT: column "b" could refer to any of these relations: "foo", "excluded" > > +1, that seems like it could be handy across the board. The user *will* get a similar HINT if they happen to *also* spell the would-be ambiguous column name slightly incorrectly: ERROR: column "barr" does not exist LINE 1: ...lict (bar) do update set bar = excluded.bar where barr != 5; ^ HINT: Perhaps you meant to reference the column "foo.bar" or the column "excluded.bar". -- Peter Geoghegan
On Fri, Jul 1, 2022 at 6:01 AM Robert Haas <robertmhaas@gmail.com> wrote: > What would probably help more is adding something like this to the > error message: > > HINT: column "b" could refer to any of these relations: "foo", "excluded" > > That could also help people who encounter this error in other > situations. I'm not 100% sure this is a good idea, but I feel like it > would have a much better chance of helping someone in this situation > than the proposed doc patch. I agree with everything you've said here, though I am 100% sure that something like your proposed HINT would be a real usability win. The "Perhaps you meant to reference the column" HINT displayed when the user misspells a column is a surprisingly popular feature. I'm aware of quite a few people singing its praises on Twitter, for example. That hardly ever happens, even with features that we think of as high impact major features. So evidently users really value these kinds of quality of life improvements. -- Peter Geoghegan
On Fri, Jul 1, 2022 at 7:58 AM Peter Geoghegan <pg@bowt.ie> wrote:
On Fri, Jul 1, 2022 at 6:01 AM Robert Haas <robertmhaas@gmail.com> wrote:
> What would probably help more is adding something like this to the
> error message:
>
> HINT: column "b" could refer to any of these relations: "foo", "excluded"
>
> That could also help people who encounter this error in other
> situations. I'm not 100% sure this is a good idea, but I feel like it
> would have a much better chance of helping someone in this situation
> than the proposed doc patch.
I agree with everything you've said here, though I am 100% sure that
something like your proposed HINT would be a real usability win.
The "Perhaps you meant to reference the column" HINT displayed when
the user misspells a column is a surprisingly popular feature. I'm
aware of quite a few people singing its praises on Twitter, for
example. That hardly ever happens, even with features that we think of
as high impact major features. So evidently users really value these
kinds of quality of life improvements.
Without any other changes being made I'm content with the status quo calling excluded a table a reasonable approximation that hasn't been seen to be confusing to our readers.
That said, I still think that the current wording should be tweak with respect to row vs. rows (especially if we continue to call it a table):
Current:
"The SET and WHERE clauses in ON CONFLICT DO UPDATE have access to theexisting row using the table's name (or an alias), and to [rows] proposed
for insertion using the special excluded table."
Change [rows] to:
"the row"
I'm undecided whether "FROM excluded" should be something that works - but I also don't think it would actually be used in any case.
David J.
On Fri, Jul 1, 2022 at 08:11:36AM -0700, David G. Johnston wrote: > That said, I still think that the current wording should be tweak with respect > to row vs. rows (especially if we continue to call it a table): > > Current: > "The SET and WHERE clauses in ON CONFLICT DO UPDATE have access to the > existing row using the table's name (or an alias), and to [rows] proposed > for insertion using the special excluded table." > > Change [rows] to: > > "the row" > > > I'm undecided whether "FROM excluded" should be something that works - but I > also don't think it would actually be used in any case. I found two places where a singular "row" would be better, doc patch attached. -- Bruce Momjian <bruce@momjian.us> https://momjian.us EDB https://enterprisedb.com Indecision is a decision. Inaction is an action. Mark Batterson
Attachment
On Fri, Jul 8, 2022 at 11:18 PM Bruce Momjian <bruce@momjian.us> wrote: > I found two places where a singular "row" would be better, doc patch > attached. +1. -- Robert Haas EDB: http://www.enterprisedb.com
On Fri, Jul 8, 2022 at 11:18:43PM -0400, Bruce Momjian wrote: > On Fri, Jul 1, 2022 at 08:11:36AM -0700, David G. Johnston wrote: > > That said, I still think that the current wording should be tweak with respect > > to row vs. rows (especially if we continue to call it a table): > > > > Current: > > "The SET and WHERE clauses in ON CONFLICT DO UPDATE have access to the > > existing row using the table's name (or an alias), and to [rows] proposed > > for insertion using the special excluded table." > > > > Change [rows] to: > > > > "the row" > > > > > > I'm undecided whether "FROM excluded" should be something that works - but I > > also don't think it would actually be used in any case. > > I found two places where a singular "row" would be better, doc patch > attached. Patch applied to all supported versions. -- Bruce Momjian <bruce@momjian.us> https://momjian.us EDB https://enterprisedb.com Indecision is a decision. Inaction is an action. Mark Batterson