Re: upsert doesn't seem to work.. - Mailing list pgsql-sql
From | Ben Morrow |
---|---|
Subject | Re: upsert doesn't seem to work.. |
Date | |
Msg-id | 20130217094411.GA28188@anubis.morrow.me.uk Whole thread Raw |
In response to | upsert doesn't seem to work.. (Bert <biertie@gmail.com>) |
List | pgsql-sql |
Quoth biertie@gmail.com (Bert): > > We continuously load data from flat files in our database. > We first insert the data into unlogged tables (in the loadoltp schema), and > then we use the 'upsert' statement to transfer the data from the load table > into the tables we are going to use. > > The load tables are unlogged, and don't have indexes / pk's on them. All > our 'real tables', which contains the data, always have a pk consisting out > of 2 fields. In the example those are 'tick_server_id' and 'item_id'. > > At first everything seems to run ok, however it seems that new fields > aren't always inserted as desired. > > > This is an example query which causes troubles: That query is basically equivalent to something like create table "st_item" ( server_id integer, item_id integer, item_desc text, primary key(server_id, item_id) ); create table "st_item_insert" ( server_id integer, item_id integer, item_desc text ); with "upsert" as ( update "st_item" et set "item_desc" = e.item_desc from "st_item_insert" e whereet.server_id = e.server_id and et.item_id = e.item_id returning et.server_id, et.item_id ) insertinto "st_item" ("server_id", "item_id", "item_desc") select et.server_id, et.item_id, et.item_desc from"st_item_insert" et where et.server_id not in ( select et.server_id from "upsert" b) andet.item_id not in ( select et.item_id from "upsert" b) There are three problems here. The first is that the NOT IN subselect selects from et instead of from b. In the context of this subselect "et" is a table reference from outside the subselect, so it's treated as a constant for each run of the subselect. That means that the subselect will return the value you are testing against for every row in "upsert", so if there were any updates at all you will make no insertions. The second is that you are making two separate subselects. This means that a row in st_item_insert will not be inserted if there is a row in "upsert" with a matching server_id and a row in "upsert" with a matching item_id, *even if they are different rows*. For instance, suppose st_item_insert has 2 1 foo 1 2 bar 2 2 baz and the 'foo' and 'bar' entries get updated. The 'baz' entry will then not get inserted, because the first subselect will find the 'foo' row and the second will find the 'bar' row. What you need is a single row subselect, like this: where (et.server_id, et.item_id) not in ( select "server_id", "item_id" from "upsert") The third is that upsert is not as simple as you think. It isn't possible (at least, not in Postgres) to take a lock on a row which doesn't exist, so it's possible that a concurrent transaction could insert a row with a conflicting key between the time the UPDATE runs and the time the INSERT runs. You need to either lock the whole table or use the retry strategy documented in the 'Trapping Errors' section of the PL/pgSQL documentation. Annoyingly, even 9's serializable transactions don't seem to help here, at least not by my experiments. Ben