Thread: how to return data from insert into ... on conflict ... returning ... into

Consider this example 


drop table if exists tbl;

drop sequence if exists seq;

create sequence seq;

create table tbl(

id int8 not null primary key,

d bytea not null,

h bytea not null, -- hash of the data, calculated automatically

dummy byte default 0 -- dummy value, see below...

);

alter table tbl add constraint uidx_tbl_h unique(h);


create or replace function trg() returns trigger language plpgsql as

$function$

begin

new.h = sha256(new.d); -- auto-hash

if new.id is null then

new.id = nextval('seq');

end if;

return new;

end;

$function$;

create trigger trg before insert or update on tbl for each row execute procedure trg();


The  hash "h" is calculated automatically for each data value "d", and it is a unique value. Let's suppose that for technical reasons, we want the primary key to be an int8 value, and foreign keys in other tables will be referencing tbl records using the sequentially generated tbl.id values.

The basic idea is that for already existing "d" values, we do not insert a new record, but use the existing record and its identifier.

This code actually works:

do

$body$

declare

aid int8;

begin

insert into tbl(d) values ('1') returning id into aid; -- ok

raise notice '1->%', aid; -- ok

insert into tbl(d) values ('2') returning id into aid; -- ok

raise notice '2->%', aid; -- ok

insert into tbl(d) values ('1') on conflict(h) do update set dummy=0 returning id into aid; -- ok;

raise notice '1->%', aid; -- ok

insert into tbl(d) values ('2') on conflict(h) do update set dummy=0 returning id into aid; -- ok;

raise notice '2->%', aid; -- ok

end;

$body$;


It will display the same id value for the same data values. But it updates the record even when it does not need to be updated. 

This code below does not work:

delete from tbl;

do

$body$

declare

aid int8;

begin

insert into tbl(d) values ('1') returning id into aid; -- ok

raise notice '1->%', aid; -- ok

insert into tbl(d) values ('2') returning id into aid; -- ok

raise notice '2->%', aid; -- ok

insert into tbl(d) values ('1') on conflict(h) do nothing returning id into aid; -- ok;

raise notice '1->%', aid; -- null ??????

insert into tbl(d) values ('1') on conflict(h) do nothing returning id into aid; -- ok;

raise notice '2->%', aid; -- null ??????

end;

$body$;


First it displays two non-null identifiers, then it displays two NULL values.

The alternative would be something like this:


delete from tbl;

do

$body$

declare

aid int8;

begin

insert into tbl(d) values ('1') returning id into aid; -- ok

raise notice '1->%', aid; -- ok

insert into tbl(d) values ('2') returning id into aid; -- ok

raise notice '2->%', aid; -- ok

select id from tbl into aid where h = sha256('1');

if not found then

insert into tbl(d) values ('1') on conflict(h) do update set id=id+0 returning id into aid;

end if;

raise notice '1->%', aid; -- null ??????

select id from tbl into aid where h = sha256('2');

if not found then

insert into tbl(d) values ('2') on conflict(h) do update set id=id+0 returning id into aid;

end if;

raise notice '2->%', aid; -- null ??????

end;

$body$;


But there are several problems with this "solution":

1. Running select to check for existence introduces a race condition. Of course it also depends on the transaction isolation, but in general it is not guaranteed that the insert won't fail with the unique constraint on h, even if the select did not find a matching record. (It might also introduce a deadlock?) I might be wrong on this, I don't really know how plpgsql procedures are executed, but I suspect that they can run in parallel.
2. Notice how the code changed. The calculation of h is done in the trigger and also at two other places. This was a very trivial example, but in a real world scenario, the calculation can be costly, and even if the above solution works, it must calculate the possibly conflicting values twice. It is ineffective, and it also requires to factor out the calculations to separate functions (or even worse, duplicate the code for the calculations). Even if calculations are not costly, this "solution" may introduce a dependency hell, because the trg() trigger can access field values that are calculated by other triggers that depend on each other. One would have to pre-calculate everything at every place where "insert into" is needed for the table, duplicating code and/or factoring the calculations out to a function with many parameters.
3. This trivial example only had a single unique constraint, but there could be more. When you have many unique constraints, then you have to write multiple SELECT statements to check for existence before doing the INSERT. Then the above "solution" becomes ugly and questionable.

Please note that adding a dummy byte does not solve the problem, because it will always update the record, even if it does not need to be updated. Degrades performance, possibly executes other triggers that do unwanted modifications to the database.

I have read the documentation here 


and the only possible actions are "do nothing" and "do update". The "do nothing" does not update the record, that is clear. But it also does not return any data, and that was not obvious to me. Why can't it return the data from the conflicted record? After the instruction is completed, "GET DIAGNOSTICS" could still be used to check the number of updates.

I understand that changing this behaviour may break backward compatibility. Would it be possible to introduce a "DO RETURN" clause that returns the data, even when there was a conflict?

Are there any better alternatives?

Thank you,

    Laszlo

On Tue, Jun 20, 2023 at 1:26 PM Les <nagylzs@gmail.com> wrote:
>
> https://www.postgresql.org/docs/current/sql-insert.html#SQL-ON-CONFLICT
>
> and the only possible actions are "do nothing" and "do update". The "do nothing" does not update the record, that is
clear.But it also does not return any data, and that was not obvious to me. Why can't it return the data from the
conflictedrecord? After the instruction is completed, "GET DIAGNOSTICS" could still be used to check the number of
updates.

Would it be fair to summarize that you'd like a feature where, upon
conflict, the command behaves as SELECT, and returns row that caused
the conflict?

The docs say:  "If the INSERT command contains a RETURNING clause, the
result will be similar to that of a SELECT statement containing the
columns and values defined in the RETURNING list, computed over the
row(s) inserted or updated by the command."

So in your desired behaviour, the RETURNING list would be computed
over the rows that cause the conflict.

> I understand that changing this behaviour may break backward compatibility. Would it be possible to introduce a "DO
RETURN"clause that returns the data, even when there was a conflict? 

If this behaviour is introduced with a new syntax, then there won't be
any fears of breaking backwards compatibility. I think DO SELECT as
the new syntax will make the intent clear. Taking one of your queries
as an example, after the new syntax it will look as follows:

insert into tbl(d) values ('1') on conflict(h) DO SELECT returning id into aid;

Best regards,
Gurjeet
http://Gurje.et