Re: doc: Clarify what "excluded" represents for INSERT ON CONFLICT - Mailing list pgsql-hackers
From | David G. Johnston |
---|---|
Subject | Re: doc: Clarify what "excluded" represents for INSERT ON CONFLICT |
Date | |
Msg-id | CAKFQuwZCb3SuuwzLJ7nsoYOUmXJrjvhAZ76vswn-FeX78L_oGw@mail.gmail.com Whole thread Raw |
In response to | Re: doc: Clarify what "excluded" represents for INSERT ON CONFLICT (Peter Geoghegan <pg@bowt.ie>) |
Responses |
Re: doc: Clarify what "excluded" represents for INSERT ON CONFLICT
|
List | pgsql-hackers |
On Thu, Jun 30, 2022 at 2:31 PM Peter Geoghegan <pg@bowt.ie> wrote:
On Thu, Jun 30, 2022 at 2:07 PM David G. Johnston
<david.g.johnston@gmail.com> wrote:
> Current:
> "The SET and WHERE clauses in ON CONFLICT DO UPDATE have access to the
> existing row using the table's name (or an alias), and to [rows] proposed
> for insertion using the special excluded table."
>
> The word table in that sentence is wrong and not a useful way to think of the thing which we've named excluded. It is a single value of a composite type having the structure of the named table.
I think that your reasoning is correct, but I don't agree with your
conclusion. The term "special excluded table" is a fudge, but that
isn't necessarily a bad thing. Sure, we could add something about the
UPDATE being similar to an UPDATE with a self-join, as I said
upthread. But I think that that would make the concept harder to
grasp.
I don't think incorporating self-joining to be helpful; the status quo is better than that. I believe people mostly think of "composite variable" from the current description even if we don't use those words - or such a concept can be explained by analogy with NEW and OLD (I think of it like a trigger, only that SQL doesn't have variables so we cannot use that term, hence just using "name").
> I'll agree that most people will mentally paper over the difference and go merrily on their way. At least one person recently did not do that, which prompted an email to the community
Can you provide a reference for this? Didn't see anything like that in
the reference you gave upthread.
OK, the discussion I am recalling happened on Discord hence the lack of a link.
On roughly 3/8 the following conversation occurred (I've trimmed out some non-relevant comments):
>>>OP
Hello, I have a simple question.
My table has a column 'transaction_date'I have an insert statement with an ON CONFLICT statement
I update using the 'excluded' values, but I only want to update if the transaction date is the same or newer.
Do I just use: "WHERE EXCLUDED.transaction_date >= transaction_date"?
so the full query is something like: INSERT INTO table VALUES (pk, yadda1, yadda2) ON CONFLICT (pk) DO UPDATE SET (yadda1 = EXCLUDED.yadda1, yadda2 = EXCLUDED.yadda2) WHERE EXCLUDED.transaction_date >= transaction_date;
>>>Other Person
I mean, the ... like 3 examples imply what it contains, and it vaguely says "and to rows proposed for insertion using the special excluded table." but...
Still, based on the BNF, that should work as you stated it.>>>OP
would perhaps it try to overwrite more than one row because many rows would meet the criteria?
It seems like it limits it to the conflict row but..>>>Other Person
Well, you're only conflicting on the PK, which is guaranteed to be unique.
>>>OP
>>>OP
Ah, so then it is limited to that row if it is specified within the ON CONFLICT action if I am reading correct.
[...]
If it matters to you, the only thing I got wrong apparently (in my limited non-sufficient testing) is that to access the current value within the table row you must use the table name. So: WHERE EXCLUDED.transaction_date >= tableName.transaction_date
>>>ME
"have access [...] to rows proposed for insertion using the special excluded table.". You have an update situation where two tables (the target and "excluded") are in scope with the exact same column names (by definition) so any column references in the value expressions need to be prefixed with which of the two tables you want to examine. As with a normal UPDATE, the left side of the SET clause entry must reference the target table and so its column cannot, and must not, be table qualified.
While it speaks of "rows" this is basically a per-row thing. As each row is tested and finds a conflict the update is executed."have access [...] to rows proposed for insertion using the special excluded table.". You have an update situation where two tables (the target and "excluded") are in scope with the exact same column names (by definition) so any column references in the value expressions need to be prefixed with which of the two tables you want to examine. As with a normal UPDATE, the left side of the SET clause entry must reference the target table and so its column cannot, and must not, be table qualified.
>>>Other Person
Mentioning something as critical as that offhand is a mistake IMO. It should have its own section.
It's also not mentioned in the BNF, though it shows up in the examples. You have to basically infer everything.
It's also not mentioned in the BNF, though it shows up in the examples. You have to basically infer everything.
>>>ME
The exact wording of the conflict_action description in head is: "The SET and WHERE clauses in ON CONFLICT DO UPDATE have access to the existing row using the table's name (or an alias), and to rows proposed for insertion using the special excluded table." I haven't read anything here that gives me a hint as to how that ended up misinterpreted so that I could possibly formulate an alternative wording. And I cannot think of a more appropriate place to locate that sentence either. The examples do cover this and the specifics here are not something that we try to represent in BNF.
I'd probably change "and to rows proposed for insertion" to "and to the corresponding row proposed for insertion".
I'd probably change "and to rows proposed for insertion" to "and to the corresponding row proposed for insertion".
>>>OP
This does not change the original conclusion we arrived at correct? If I am reading what you are saying right, since it only discovered the conflict after examining the row, then by the same token it will only affect the same row where the conflict was detected.
I have a hard time imagining a user that reads the INSERT docs and
imagines that "excluded" is a relation that is visible to the query in
ways that are not limited to expression evaluation for the UPDATE's
WHERE/SET.
Yes, and based on a single encounter I agree this doesn't seem like a broadly encountered issue. My takeaway from that eventually led to this proposal. The "Other Person" who is complaining about the docs is one of the mentors on the Discord server and works for one of the corporate contributors to the community. (I suppose Discord is considered public so maybe this redaction is unnecessary...)
David J.
pgsql-hackers by date: