On Wednesday, July 13, 2022, André Hänsel <
andre@webkr.de> wrote:
The wiki entry https://wiki.postgresql.org/wiki/UPSERT kind of suggests that with the introduction of ON CONFLICT DO UPDATE the problem of upserts is solved. But is it?
A common use case for upserts is to keep a table up to date from an external data source. So you might have a cron job that runs a relatively large batch of upserts every couple of minutes.
I have found that this use case is not adequately covered by ON CONFLICT DO UPDATE for two reasons:
- New versions are created for all rows, even if the data is identical. This quickly fills up the WAL and puts unnecessary load on the tablespace drives.
- If the conflict target is not the serial column, the sequence backing the serial column gets incremented for every row. This quickly reaches the point where the serial for new rows exceeds the range of an integer.
Does this mean I have to SELECT the data first and do the conflict check in the application or is there a better SQL-only way?
Well, first of all, don’t use serial/auto-increment in this situation.
Second, you need to figure out what the unique key in the provided data is and key off of that for detecting duplicates. Ideally it already contains some bigint key column which then makes the first point easy to accept.
David J.