how to return data from insert into ... on conflict ... returning ... into - Mailing list pgsql-general

From Les
Subject how to return data from insert into ... on conflict ... returning ... into
Date
Msg-id CAKXe9UAuza2LTAJm_rH7JtXxSLg8mSfTMOfqLjmUwsDQCK7JUQ@mail.gmail.com
Whole thread Raw
Responses Re: how to return data from insert into ... on conflict ... returning ... into
List pgsql-general
Consider this example 


drop table if exists tbl;

drop sequence if exists seq;

create sequence seq;

create table tbl(

id int8 not null primary key,

d bytea not null,

h bytea not null, -- hash of the data, calculated automatically

dummy byte default 0 -- dummy value, see below...

);

alter table tbl add constraint uidx_tbl_h unique(h);


create or replace function trg() returns trigger language plpgsql as

$function$

begin

new.h = sha256(new.d); -- auto-hash

if new.id is null then

new.id = nextval('seq');

end if;

return new;

end;

$function$;

create trigger trg before insert or update on tbl for each row execute procedure trg();


The  hash "h" is calculated automatically for each data value "d", and it is a unique value. Let's suppose that for technical reasons, we want the primary key to be an int8 value, and foreign keys in other tables will be referencing tbl records using the sequentially generated tbl.id values.

The basic idea is that for already existing "d" values, we do not insert a new record, but use the existing record and its identifier.

This code actually works:

do

$body$

declare

aid int8;

begin

insert into tbl(d) values ('1') returning id into aid; -- ok

raise notice '1->%', aid; -- ok

insert into tbl(d) values ('2') returning id into aid; -- ok

raise notice '2->%', aid; -- ok

insert into tbl(d) values ('1') on conflict(h) do update set dummy=0 returning id into aid; -- ok;

raise notice '1->%', aid; -- ok

insert into tbl(d) values ('2') on conflict(h) do update set dummy=0 returning id into aid; -- ok;

raise notice '2->%', aid; -- ok

end;

$body$;


It will display the same id value for the same data values. But it updates the record even when it does not need to be updated. 

This code below does not work:

delete from tbl;

do

$body$

declare

aid int8;

begin

insert into tbl(d) values ('1') returning id into aid; -- ok

raise notice '1->%', aid; -- ok

insert into tbl(d) values ('2') returning id into aid; -- ok

raise notice '2->%', aid; -- ok

insert into tbl(d) values ('1') on conflict(h) do nothing returning id into aid; -- ok;

raise notice '1->%', aid; -- null ??????

insert into tbl(d) values ('1') on conflict(h) do nothing returning id into aid; -- ok;

raise notice '2->%', aid; -- null ??????

end;

$body$;


First it displays two non-null identifiers, then it displays two NULL values.

The alternative would be something like this:


delete from tbl;

do

$body$

declare

aid int8;

begin

insert into tbl(d) values ('1') returning id into aid; -- ok

raise notice '1->%', aid; -- ok

insert into tbl(d) values ('2') returning id into aid; -- ok

raise notice '2->%', aid; -- ok

select id from tbl into aid where h = sha256('1');

if not found then

insert into tbl(d) values ('1') on conflict(h) do update set id=id+0 returning id into aid;

end if;

raise notice '1->%', aid; -- null ??????

select id from tbl into aid where h = sha256('2');

if not found then

insert into tbl(d) values ('2') on conflict(h) do update set id=id+0 returning id into aid;

end if;

raise notice '2->%', aid; -- null ??????

end;

$body$;


But there are several problems with this "solution":

1. Running select to check for existence introduces a race condition. Of course it also depends on the transaction isolation, but in general it is not guaranteed that the insert won't fail with the unique constraint on h, even if the select did not find a matching record. (It might also introduce a deadlock?) I might be wrong on this, I don't really know how plpgsql procedures are executed, but I suspect that they can run in parallel.
2. Notice how the code changed. The calculation of h is done in the trigger and also at two other places. This was a very trivial example, but in a real world scenario, the calculation can be costly, and even if the above solution works, it must calculate the possibly conflicting values twice. It is ineffective, and it also requires to factor out the calculations to separate functions (or even worse, duplicate the code for the calculations). Even if calculations are not costly, this "solution" may introduce a dependency hell, because the trg() trigger can access field values that are calculated by other triggers that depend on each other. One would have to pre-calculate everything at every place where "insert into" is needed for the table, duplicating code and/or factoring the calculations out to a function with many parameters.
3. This trivial example only had a single unique constraint, but there could be more. When you have many unique constraints, then you have to write multiple SELECT statements to check for existence before doing the INSERT. Then the above "solution" becomes ugly and questionable.

Please note that adding a dummy byte does not solve the problem, because it will always update the record, even if it does not need to be updated. Degrades performance, possibly executes other triggers that do unwanted modifications to the database.

I have read the documentation here 


and the only possible actions are "do nothing" and "do update". The "do nothing" does not update the record, that is clear. But it also does not return any data, and that was not obvious to me. Why can't it return the data from the conflicted record? After the instruction is completed, "GET DIAGNOSTICS" could still be used to check the number of updates.

I understand that changing this behaviour may break backward compatibility. Would it be possible to introduce a "DO RETURN" clause that returns the data, even when there was a conflict?

Are there any better alternatives?

Thank you,

    Laszlo

pgsql-general by date:

Previous
From: David Rowley
Date:
Subject: Re: pb with join plan
Next
From: Marc Millas
Date:
Subject: Re: pb with join plan