Thread: Is it possible to write a generic UPSERT?

Is it possible to write a generic UPSERT?

From

Mario Emmenlauer

Date:

12 November 2020, 13:58:37

Dear all,

I hope this request is not too naiive. Please ignore if its not
matching this lists focus, or if it just shows my lack of SQL
knowledge. Any pointer to documentation would be appreciated!

My question is about the "ON CONFLICT UPDATE" clause. We often have
one dedicated condition that we want to achieve: Insert an item if
it does not exist, or update all given fields if it does exist.

Your excellent documentation makes it easy to implement this in SQL.

But the statement is slightly complex to type, and I find me and my
colleagues often spend more time on this than I would hope. Our two
main challenges are:
(1) we have to look up the uniqueness constraints on the table, and
(2) we have to duplicate the insert statement in the UPDATE section
    again, because virtually all fields should get overwritten
    (except for the conflicting ones). On long inserts this can be
    quite annoying and error-prone.

I can see how "ON CONFLICT" is very powerful. But that power seems
often a burden for us. We would prefer something that is less manual
effort for the specific use case. Basically, we would like:
    INSERT if not exist, and
    UPDATE _all_ non-conflicting fields in case of _any_ conflict

In my (naiive) thinking, such a construct would cover 99% of our
use cases. Or did other people make very different experiences?


Now the question: I fail to generically implement this. I do not
necessarily mean that this has to be supported by PostgreSQL. Any
solution would work for us, be it in SQL, PostgreSQL or C/C++. For
example an acceptable solution would be a C/C++ wrapper method that
generates the statement, given the corresponding INSERT statement and
the types and values of all arguments. The wrapper would also have
access to the table definitions because we create all tables in code.

We currently do not have concurrency concerns so a more complex
solution using multiple statements would be fine.

Has anybody ever done something like this? Is there an SQL way to
achieve this? Or another programmatic way?

All the best,

    Mario Emmenlauer


--
BioDataAnalysis GmbH, Mario Emmenlauer      Tel. Buero: +49-89-74677203
Balanstr. 43                   mailto: memmenlauer * biodataanalysis.de
D-81669 München                          http://www.biodataanalysis.de/

Re: Is it possible to write a generic UPSERT?

From

Alban Hertroys

Date:

12 November 2020, 16:55:25

> On 12 Nov 2020, at 14:58, Mario Emmenlauer <mario@emmenlauer.de> wrote:

(…)

> But the statement is slightly complex to type, and I find me and my
> colleagues often spend more time on this than I would hope. Our two
> main challenges are:
> (1) we have to look up the uniqueness constraints on the table, and
> (2) we have to duplicate the insert statement in the UPDATE section
>    again, because virtually all fields should get overwritten
>    (except for the conflicting ones). On long inserts this can be
>    quite annoying and error-prone.
>
> I can see how "ON CONFLICT" is very powerful. But that power seems
> often a burden for us. We would prefer something that is less manual
> effort for the specific use case. Basically, we would like:
>    INSERT if not exist, and
>    UPDATE _all_ non-conflicting fields in case of _any_ conflict
>
> In my (naiive) thinking, such a construct would cover 99% of our
> use cases. Or did other people make very different experiences?

(…)

> Has anybody ever done something like this? Is there an SQL way to
> achieve this? Or another programmatic way?

We generate the SQL @work based on the definitions in, IIRC, the information_schema. It has tables for both the column
listsper table and the primary key definitions. 

With that, an SQL statement that returns the required SQL statement is easy to generate, after which you can execute it
eitherfrom a plpgsql execute statement in a function or in a do-block. 

We do this in plpgsql, but that’s mostly because this code is part of our ETL process and it has to perform some other
logicon the same data anyway. I could look up our code for you tomorrow, but that looks to be a busy day, so I can’t
promise.

Alban Hertroys
--
If you can't see the forest for the trees,
cut the trees and you'll find there is no forest.

Re: Is it possible to write a generic UPSERT?

From

Mario Emmenlauer

Date:

12 November 2020, 17:03:07

On 12.11.20 17:55, Alban Hertroys wrote:
>> On 12 Nov 2020, at 14:58, Mario Emmenlauer <mario@emmenlauer.de> wrote:
> 
> (…)
> 
>> But the statement is slightly complex to type, and I find me and my
>> colleagues often spend more time on this than I would hope. Our two
>> main challenges are:
>> (1) we have to look up the uniqueness constraints on the table, and
>> (2) we have to duplicate the insert statement in the UPDATE section
>>    again, because virtually all fields should get overwritten
>>    (except for the conflicting ones). On long inserts this can be
>>    quite annoying and error-prone.
>>
>> I can see how "ON CONFLICT" is very powerful. But that power seems
>> often a burden for us. We would prefer something that is less manual
>> effort for the specific use case. Basically, we would like:
>>    INSERT if not exist, and
>>    UPDATE _all_ non-conflicting fields in case of _any_ conflict
>>
>> In my (naiive) thinking, such a construct would cover 99% of our
>> use cases. Or did other people make very different experiences?
> 
> (…)
> 
>> Has anybody ever done something like this? Is there an SQL way to
>> achieve this? Or another programmatic way?
> 
> We generate the SQL @work based on the definitions in, IIRC, the information_schema. It has tables for both the
columnlists per table and the primary key definitions.
 
> 
> With that, an SQL statement that returns the required SQL statement is easy to generate, after which you can execute
iteither from a plpgsql execute statement in a function or in a do-block.
 


This is actually a very very interesting idea! I did not consider
that we could completely generate the statements based on the actual
table information from the information_schema. I need to give this a
bit more thought but I very much like the idea, thanks for pushing me
in a new direction!

All the best,

    Mario Emmenlauer


--
BioDataAnalysis GmbH, Mario Emmenlauer      Tel. Buero: +49-89-74677203
Balanstr. 43                   mailto: memmenlauer * biodataanalysis.de
D-81669 München                          http://www.biodataanalysis.de/

Re: Is it possible to write a generic UPSERT?

From

Michael Lewis

Date:

12 November 2020, 17:34:58

On Thu, Nov 12, 2020 at 6:58 AM Mario Emmenlauer <mario@emmenlauer.de> wrote:

I can see how "ON CONFLICT" is very powerful. But that power seems
often a burden for us. We would prefer something that is less manual
effort for the specific use case. Basically, we would like:
INSERT if not exist, and
UPDATE _all_ non-conflicting fields in case of _any_ conflict

If you do not have significant ratio of HOT (heap only tuple) updates that you want to preserve and you don't have sequences that are GENERATED AS ALWAYS (rather than BY DEFAULT), you could consider just doing a DELETE where the keys exist, then insert all the rows. It should be trivial to figure out the primary key or other unique index to match on.

MERGE command is implemented for this use case in some DBMS, but not Postgres (yet?).

Re: Is it possible to write a generic UPSERT?

From

Andreas Kretschmer

Date:

12 November 2020, 19:23:06


Am 12.11.20 um 18:34 schrieb Michael Lewis:
> MERGE command is implemented for this use case in some DBMS, but not 
> Postgres (yet?).

MERGE is available in 2ndqPostgres, but that's not open source (it's 
available for 2ndQ-customers on request).


Regards, Andreas

-- 
2ndQuadrant, an EDB company
www.2ndQuadrant.com / www.enterprisedb.com

Re: Is it possible to write a generic UPSERT?

From

Mario Emmenlauer

Date:

13 November 2020, 09:28:01

On 12.11.20 18:34, Michael Lewis wrote:
> On Thu, Nov 12, 2020 at 6:58 AM Mario Emmenlauer <mario@emmenlauer.de <mailto:mario@emmenlauer.de>> wrote:
> 
>     I can see how "ON CONFLICT" is very powerful. But that power seems
>     often a burden for us. We would prefer something that is less manual
>     effort for the specific use case. Basically, we would like:
>         INSERT if not exist, and
>         UPDATE _all_ non-conflicting fields in case of _any_ conflict
> 
> 
> If you do not have significant ratio of HOT (heap only tuple) updates that you want to preserve and you don't have
sequencesthat are GENERATED AS ALWAYS
 
> (rather than BY DEFAULT), you could consider just doing a DELETE where the keys exist, then insert all the rows. It
shouldbe trivial to figure out the primary
 
> key or other unique index to match on.
> 
> MERGE command is implemented for this use case in some DBMS, but not Postgres (yet?).

Actually I'm quite happy you suggest this, because its the one thing
that also came to my mind :) I was not really sure if this is a sane
idea. But it seems this could be the "easiest" way out, because its
short (like "easy to write") and works for all our use cases.

The main potential problem for my use case may be cascading deletes,
that I think we may want to use in the long run.

All the best,

    Mario Emmenlauer


--
BioDataAnalysis GmbH, Mario Emmenlauer      Tel. Buero: +49-89-74677203
Balanstr. 43                   mailto: memmenlauer * biodataanalysis.de
D-81669 München                          http://www.biodataanalysis.de/