Re: Revisiting UPDATE FROM ... ORDER BY not respected - Mailing list pgsql-general
From | Carlo Stonebanks |
---|---|
Subject | Re: Revisiting UPDATE FROM ... ORDER BY not respected |
Date | |
Msg-id | BLU0-SMTP488332F93CDFF508349E3296AA0@phx.gbl Whole thread Raw |
In response to | Re: Revisiting UPDATE FROM ... ORDER BY not respected (Tom Lane <tgl@sss.pgh.pa.us>) |
List | pgsql-general |
Thanks Tom, >> I don't think this really works if multiple processes try to update the table concurrently --- does that ever happen in your apps? << Technically possible, but the production reality makes it unlikely. Operationally, it makes no sense for it to be run more than once, or by more than one person or against the same table as it represnts a data import (will only be run once to prepare the data). I will ask production about this to be sure. Like I said, I am baffled >> What PG versions are they using? Pre-8.2 this couldn't have been relied on at all, because the planner would still try to flatten the subselect. << In this case, 8.3. >> I rather wonder why you're insisting on sequential assignments at all. << The relation "my_schema.my_table" is actually a table that represents a set of operations to a data import process. The order of processing is represented by the field "order_by_value". Those values come from an external source and are not synced with our domain set (i.e. our sequences for generating primary key values) and of course are likely to conflict. All import tables must get an import id - represented by the field "id". The import process expects this field to be filled and unique so that it can be related to the data warehouse asset management and operation auditing systems. Thanks, Carlo -----Original Message----- From: Tom Lane [mailto:tgl@sss.pgh.pa.us] Sent: April 12, 2011 8:15 PM To: Carlo Stonebanks Cc: pgsql-general@postgresql.org Subject: Re: [GENERAL] Revisiting UPDATE FROM ... ORDER BY not respected Carlo Stonebanks <stonec.register@sympatico.ca> writes: > A few years ago I asked about creating a single UPDATE statement to assign > id's from a sequence, with the sequences applied in a particular order. In > other words, order the table, then apply nextval-generated id's to the id > field in question. > Here is the original post: > http://archives.postgresql.org/pgsql-general/2009-04/msg01123.php > The explanation and solution provided made total sense. To my surprise, > every now and then users still report sequences being applied out of order. What PG versions are they using? Pre-8.2 this couldn't have been relied on at all, because the planner would still try to flatten the subselect. > Below is the code in question. I admit that I am completely baffled. Does > anyone have any clues? I don't think this really works if multiple processes try to update the table concurrently --- does that ever happen in your apps? Two processes starting to execute this query at about the same time would assign distinct sets of IDs for the same (or at least overlapping) sets of rows, and then only one of them would actually get to update any particular row. Depending on chances of timing, that could easily result in an out-of-order set of updates reaching commit. Another thought is that in READ COMMITTED mode, you're at risk of extra evaluations of nextval(), because the query tree will be re-evaluated after waiting out a conflicting update to a target row. The query as given seems safe against that effect when competing against other occurrences of itself because of the "WHERE my_table.id IS NULL" bit --- if somebody else beats you to a row update, the newer nextval value will just get dropped on the floor (if it's even generated at all, which it might not be). However, if there are ever concurrent updates to the target row that don't change the id column to non-null, you'd possibly have a problem from re-evaluations of nextval(). You could probably defend against both of those effects by taking a table lock that prevents other updates while you do this. On the whole, though, I rather wonder why you're insisting on sequential assignments at all. If this operation can be done on a whim by independent processes then sequentiality is likely to be approximate at best anyway. regards, tom lane
pgsql-general by date: