Thread: Is it possible to write a generic UPSERT?
Dear all, I hope this request is not too naiive. Please ignore if its not matching this lists focus, or if it just shows my lack of SQL knowledge. Any pointer to documentation would be appreciated! My question is about the "ON CONFLICT UPDATE" clause. We often have one dedicated condition that we want to achieve: Insert an item if it does not exist, or update all given fields if it does exist. Your excellent documentation makes it easy to implement this in SQL. But the statement is slightly complex to type, and I find me and my colleagues often spend more time on this than I would hope. Our two main challenges are: (1) we have to look up the uniqueness constraints on the table, and (2) we have to duplicate the insert statement in the UPDATE section again, because virtually all fields should get overwritten (except for the conflicting ones). On long inserts this can be quite annoying and error-prone. I can see how "ON CONFLICT" is very powerful. But that power seems often a burden for us. We would prefer something that is less manual effort for the specific use case. Basically, we would like: INSERT if not exist, and UPDATE _all_ non-conflicting fields in case of _any_ conflict In my (naiive) thinking, such a construct would cover 99% of our use cases. Or did other people make very different experiences? Now the question: I fail to generically implement this. I do not necessarily mean that this has to be supported by PostgreSQL. Any solution would work for us, be it in SQL, PostgreSQL or C/C++. For example an acceptable solution would be a C/C++ wrapper method that generates the statement, given the corresponding INSERT statement and the types and values of all arguments. The wrapper would also have access to the table definitions because we create all tables in code. We currently do not have concurrency concerns so a more complex solution using multiple statements would be fine. Has anybody ever done something like this? Is there an SQL way to achieve this? Or another programmatic way? All the best, Mario Emmenlauer -- BioDataAnalysis GmbH, Mario Emmenlauer Tel. Buero: +49-89-74677203 Balanstr. 43 mailto: memmenlauer * biodataanalysis.de D-81669 München http://www.biodataanalysis.de/
> On 12 Nov 2020, at 14:58, Mario Emmenlauer <mario@emmenlauer.de> wrote: (…) > But the statement is slightly complex to type, and I find me and my > colleagues often spend more time on this than I would hope. Our two > main challenges are: > (1) we have to look up the uniqueness constraints on the table, and > (2) we have to duplicate the insert statement in the UPDATE section > again, because virtually all fields should get overwritten > (except for the conflicting ones). On long inserts this can be > quite annoying and error-prone. > > I can see how "ON CONFLICT" is very powerful. But that power seems > often a burden for us. We would prefer something that is less manual > effort for the specific use case. Basically, we would like: > INSERT if not exist, and > UPDATE _all_ non-conflicting fields in case of _any_ conflict > > In my (naiive) thinking, such a construct would cover 99% of our > use cases. Or did other people make very different experiences? (…) > Has anybody ever done something like this? Is there an SQL way to > achieve this? Or another programmatic way? We generate the SQL @work based on the definitions in, IIRC, the information_schema. It has tables for both the column listsper table and the primary key definitions. With that, an SQL statement that returns the required SQL statement is easy to generate, after which you can execute it eitherfrom a plpgsql execute statement in a function or in a do-block. We do this in plpgsql, but that’s mostly because this code is part of our ETL process and it has to perform some other logicon the same data anyway. I could look up our code for you tomorrow, but that looks to be a busy day, so I can’t promise. Alban Hertroys -- If you can't see the forest for the trees, cut the trees and you'll find there is no forest.
On 12.11.20 17:55, Alban Hertroys wrote: >> On 12 Nov 2020, at 14:58, Mario Emmenlauer <mario@emmenlauer.de> wrote: > > (…) > >> But the statement is slightly complex to type, and I find me and my >> colleagues often spend more time on this than I would hope. Our two >> main challenges are: >> (1) we have to look up the uniqueness constraints on the table, and >> (2) we have to duplicate the insert statement in the UPDATE section >> again, because virtually all fields should get overwritten >> (except for the conflicting ones). On long inserts this can be >> quite annoying and error-prone. >> >> I can see how "ON CONFLICT" is very powerful. But that power seems >> often a burden for us. We would prefer something that is less manual >> effort for the specific use case. Basically, we would like: >> INSERT if not exist, and >> UPDATE _all_ non-conflicting fields in case of _any_ conflict >> >> In my (naiive) thinking, such a construct would cover 99% of our >> use cases. Or did other people make very different experiences? > > (…) > >> Has anybody ever done something like this? Is there an SQL way to >> achieve this? Or another programmatic way? > > We generate the SQL @work based on the definitions in, IIRC, the information_schema. It has tables for both the columnlists per table and the primary key definitions. > > With that, an SQL statement that returns the required SQL statement is easy to generate, after which you can execute iteither from a plpgsql execute statement in a function or in a do-block. This is actually a very very interesting idea! I did not consider that we could completely generate the statements based on the actual table information from the information_schema. I need to give this a bit more thought but I very much like the idea, thanks for pushing me in a new direction! All the best, Mario Emmenlauer -- BioDataAnalysis GmbH, Mario Emmenlauer Tel. Buero: +49-89-74677203 Balanstr. 43 mailto: memmenlauer * biodataanalysis.de D-81669 München http://www.biodataanalysis.de/
On Thu, Nov 12, 2020 at 6:58 AM Mario Emmenlauer <mario@emmenlauer.de> wrote:
I can see how "ON CONFLICT" is very powerful. But that power seems
often a burden for us. We would prefer something that is less manual
effort for the specific use case. Basically, we would like:
INSERT if not exist, and
UPDATE _all_ non-conflicting fields in case of _any_ conflict
If you do not have significant ratio of HOT (heap only tuple) updates that you want to preserve and you don't have sequences that are GENERATED AS ALWAYS (rather than BY DEFAULT), you could consider just doing a DELETE where the keys exist, then insert all the rows. It should be trivial to figure out the primary key or other unique index to match on.
MERGE command is implemented for this use case in some DBMS, but not Postgres (yet?).
Am 12.11.20 um 18:34 schrieb Michael Lewis: > MERGE command is implemented for this use case in some DBMS, but not > Postgres (yet?). MERGE is available in 2ndqPostgres, but that's not open source (it's available for 2ndQ-customers on request). Regards, Andreas -- 2ndQuadrant, an EDB company www.2ndQuadrant.com / www.enterprisedb.com
On 12.11.20 18:34, Michael Lewis wrote: > On Thu, Nov 12, 2020 at 6:58 AM Mario Emmenlauer <mario@emmenlauer.de <mailto:mario@emmenlauer.de>> wrote: > > I can see how "ON CONFLICT" is very powerful. But that power seems > often a burden for us. We would prefer something that is less manual > effort for the specific use case. Basically, we would like: > INSERT if not exist, and > UPDATE _all_ non-conflicting fields in case of _any_ conflict > > > If you do not have significant ratio of HOT (heap only tuple) updates that you want to preserve and you don't have sequencesthat are GENERATED AS ALWAYS > (rather than BY DEFAULT), you could consider just doing a DELETE where the keys exist, then insert all the rows. It shouldbe trivial to figure out the primary > key or other unique index to match on. > > MERGE command is implemented for this use case in some DBMS, but not Postgres (yet?). Actually I'm quite happy you suggest this, because its the one thing that also came to my mind :) I was not really sure if this is a sane idea. But it seems this could be the "easiest" way out, because its short (like "easy to write") and works for all our use cases. The main potential problem for my use case may be cascading deletes, that I think we may want to use in the long run. All the best, Mario Emmenlauer -- BioDataAnalysis GmbH, Mario Emmenlauer Tel. Buero: +49-89-74677203 Balanstr. 43 mailto: memmenlauer * biodataanalysis.de D-81669 München http://www.biodataanalysis.de/