Thread: Upsert error "column reference is ambiguous"

Upsert error "column reference is ambiguous"

From
Tim Starling
Date:
Regarding upsert syntax.

psql (16.8 (Ubuntu 16.8-0ubuntu0.24.04.1), server 14.13 (Ubuntu 
14.13-0ubuntu0.22.04.1))
=> CREATE TABLE t (k INTEGER, v INTEGER);
=> CREATE UNIQUE INDEX t_k ON t (k);
=> INSERT INTO t VALUES (1,1);
INSERT 0 1
=> INSERT INTO t VALUES (1,1) ON CONFLICT (k) DO UPDATE SET v=v+1;
ERROR:  column reference "v" is ambiguous

Please convince me that this is not a bug.

If I understand correctly, in the expression "v+1", both EXCLUDED.v 
and t.v are present as the unqualified name "v". This is always the 
case and it is never possible to reference an unqualified field name 
in the expression of a conflict action.

Thus, any query with an unqualified name is statically known to be 
invalid. It is not a b/c break to remove EXCLUDED.v from the list of 
unqualified fields in a new major release of PG, thus allowing it to DWIM.

I'm a maintainer of MediaWiki. Some kind person contributed PostgreSQL 
support many years ago so now I am required to maintain it in 
perpetuity. The work seems out of proportion to the benefit, but 
that's the industry I guess. A handful of users benefit, such as 
wiki.postgresql.org.

Our application has an upsert method which takes the assignment 
"v=v+1" as a string. It is feasible to split it on the equals sign 
into the destination field and expression components, but it is not 
feasible to parse the expression or to require callers to supply an 
AST tree for the expressions they give us. It is not feasible to 
require callers to prefix all field names with the table name.

We currently emulate upsert on PostgreSQL using several awkward and 
inefficient queries. It would be nice to be able to use PostgreSQL's 
native upsert feature. But unless someone here has an idea for a 
workaround, I think this field name resolution policy is a total 
blocker. We can implement upsert on MySQL and SQLite but on PostgreSQL 
it will remain emulated.

-- Tim Starling





Re: Upsert error "column reference is ambiguous"

From
"David G. Johnston"
Date:
On Sunday, April 27, 2025, Tim Starling <tstarling@wikimedia.org> wrote:
thus allowing it to DWIM.

We intentionally choose (or, in any case have established) a SWYM approach here.

Personally I’d be fine with the reduced helpfulness in trying to prevent buggy queries in the interest of being more conforming with the broader world.  I am curious as to whether we are in strict adherence to the SQL Standard on this point though.  Makes deviation a bit tougher to justify.

It does seem that project policies would prevent back-patching such a change.

David J.

Re: Upsert error "column reference is ambiguous"

From
Tom Lane
Date:
Tim Starling <tstarling@wikimedia.org> writes:
> Regarding upsert syntax.
> => INSERT INTO t VALUES (1,1) ON CONFLICT (k) DO UPDATE SET v=v+1;
> ERROR:  column reference "v" is ambiguous

> Please convince me that this is not a bug.

It's not a bug.

> If I understand correctly, in the expression "v+1", both EXCLUDED.v 
> and t.v are present as the unqualified name "v". This is always the 
> case and it is never possible to reference an unqualified field name 
> in the expression of a conflict action.

Correct: it's not clear whether you mean to use "v" from the new
desired-to-be-inserted row or "v" from the existing row.

> Thus, any query with an unqualified name is statically known to be 
> invalid. It is not a b/c break to remove EXCLUDED.v from the list of 
> unqualified fields in a new major release of PG, thus allowing it to DWIM.

Even if I were on board with arbitrarily adopting one of the two
possible interpretations, it's far from obvious to me that most people
would agree that "v" should mean the value from the existing row,
rather than the new value.  Better to make them say which they want.

            regards, tom lane



Re: Upsert error "column reference is ambiguous"

From
Tim Starling
Date:
On 28/4/25 20:54, Tom Lane wrote:
> Even if I were on board with arbitrarily adopting one of the two
> possible interpretations, it's far from obvious to me that most people
> would agree that "v" should mean the value from the existing row,
> rather than the new value.  Better to make them say which they want.

OK sure, no way to tell, but if every other DBMS does it the same way 
then that might be a hint.

Also, I'm just saying, the upsert feature is fully useless to me with 
this name resolution policy.

In the single-row case, there's no need for EXCLUDED at all, because 
the client knows everything about the excluded row. Recall my example:

INSERT INTO t VALUES (1,1) ON CONFLICT (k) DO UPDATE SET v=v+1;

If I meant SET v=EXCLUDED.v+1 I would have just written v=2. The 
default policy (in other DBMSes) follows by analogy from the 
single-row case.

-- Tim Starling




Re: Upsert error "column reference is ambiguous"

From
Laurenz Albe
Date:
On Mon, 2025-04-28 at 21:22 +1000, Tim Starling wrote:
> On 28/4/25 20:54, Tom Lane wrote:
> > Even if I were on board with arbitrarily adopting one of the two
> > possible interpretations, it's far from obvious to me that most people
> > would agree that "v" should mean the value from the existing row,
> > rather than the new value.  Better to make them say which they want.
>
> OK sure, no way to tell, but if every other DBMS does it the same way
> then that might be a hint.

Which DBMS that supports INSERT .. ON CONFLICT do you have in mind?

> Also, I'm just saying, the upsert feature is fully useless to me with
> this name resolution policy.

Because you cannot write EXCLUDED?

> In the single-row case, there's no need for EXCLUDED at all, because
> the client knows everything about the excluded row. Recall my example:
>
> INSERT INTO t VALUES (1,1) ON CONFLICT (k) DO UPDATE SET v=v+1;
>
> If I meant SET v=EXCLUDED.v+1 I would have just written v=2. The
> default policy (in other DBMSes) follows by analogy from the
> single-row case.

Actually, for many people, the DWIM would be the other way around:

INSERT INTO tab (col)
SELECT something FROM othertab
ON CONFLICT (id)
/* "col" should get set to "something" */
DO UPDATE SET col = col;

I can follow your reasoning above, but if the SQL parser tried to
guess the user's intention like that, it is likely to go wrong
sometimes.  As Tom said, better force the user to be explicit.

Yours,
Laurenz Albe



Re: Upsert error "column reference is ambiguous"

From
Peter Geoghegan
Date:
On Mon, Apr 28, 2025 at 12:56 AM Tim Starling <tstarling@wikimedia.org> wrote:
> Our application has an upsert method which takes the assignment
> "v=v+1" as a string. It is feasible to split it on the equals sign
> into the destination field and expression components, but it is not
> feasible to parse the expression or to require callers to supply an
> AST tree for the expressions they give us. It is not feasible to
> require callers to prefix all field names with the table name.

You can use an alias for the target table name. Is it feasible to
require callers to prefix all field names with a generic table name
alias?

--
Peter Geoghegan



Re: Upsert error "column reference is ambiguous"

From
Tom Lane
Date:
Tim Starling <tstarling@wikimedia.org> writes:
> On 28/4/25 20:54, Tom Lane wrote:
>> Even if I were on board with arbitrarily adopting one of the two
>> possible interpretations, it's far from obvious to me that most people
>> would agree that "v" should mean the value from the existing row,
>> rather than the new value.  Better to make them say which they want.

> OK sure, no way to tell, but if every other DBMS does it the same way 
> then that might be a hint.

AFAIK, "ON CONFLICT" is a Postgres-ism.  Exactly which constructs
in exactly which other databases are you citing as precedent?

> In the single-row case, there's no need for EXCLUDED at all, because 
> the client knows everything about the excluded row.

Laurenz already provided the counter-example of an INSERT/SELECT,
but there's also the possibility of the INSERT supplying a computed
default value for a column, e.g., CURRENT_TIMESTAMP.  So you won't
get far with that argument.

I do actually have some sympathy for your proposal after thinking
about it a bit more, but the argument I would use is "the behavior
of the ON CONFLICT UPDATE SET list should be as much as possible like
the behavior of an ordinary UPDATE's SET list".  Since "v = v+1" would
refer to the existing row's "v" in regular UPDATE, it's sensible to
let that happen here too.  Of course the counter-argument is that this
should be compared not to a trivial UPDATE, but an "UPDATE ... FROM
othertable" where the othertable supplies some conflicting column
name(s).  In that situation we're going to make you resolve the
conflict by qualifying the column names.  The only thing that makes
that not a precise parallel is that EXCLUDED is not something the user
wrote into the query explicitly, so there's no opportunity to
substitute different column aliases, as a FROM clause would allow.
Perhaps that justifies demoting it to second-class citizenship whereby
EXCLUDED has to be qualified but the target table doesn't.  (I don't
find this argument hugely compelling, but it's an argument.)

BTW, I did wonder how hard it would be to make such a change.
On first glance it seems to be a one-liner:

diff --git a/src/backend/parser/analyze.c b/src/backend/parser/analyze.c
index 1f4d6adda52..f11727adbaa 100644
--- a/src/backend/parser/analyze.c
+++ b/src/backend/parser/analyze.c
@@ -1306,7 +1306,7 @@ transformOnConflictClause(ParseState *pstate,
          * Add the EXCLUDED pseudo relation to the query namespace, making it
          * available in the UPDATE subexpressions.
          */
-        addNSItemToQuery(pstate, exclNSItem, false, true, true);
+        addNSItemToQuery(pstate, exclNSItem, false, true, false);
 
         /*
          * Now transform the UPDATE subexpressions.

So this isn't about implementation difficulty but about whether
we think it's a good idea.

            regards, tom lane



Re: Upsert error "column reference is ambiguous"

From
"David G. Johnston"
Date:

On Monday, April 28, 2025, Tom Lane <tgl@sss.pgh.pa.us> wrote:

AFAIK, "ON CONFLICT" is a Postgres-ism.  Exactly which constructs
in exactly which other databases are you citing as precedent?

I confirmed the SQLite reference from the original email.

The upsert above inserts the new vocabulary word "jovial" if that word is not already in the dictionary, or if it is already in the dictionary, it increments the counter. The "count+1" expression could also be written as "vocabulary.count". PostgreSQL requires the second form, but SQLite accepts either.”


David J.