Thread: changing multiple pk's in one update
Hello all, I have a table with a primary key column that contains sequential numbers. Sometimes I need to shift them all up or down by a fixed amount. For example, if I have four rows with primary keys, 2, 3, 4, 5, I might want to shift them down by 1 by doing: UPDATE mytable SET id=id-1 (where "id" is the pk column) so that the pk's are now 1, 2, 3, 4. When I try to shift them up by using +1 in the above update statement, I get (not surprisingly) a duplicate key error. I also realize that the -1 case above works only by luck. So my question: Is there some way, perhaps with ORDER BY, that I can achieve the change I want with a single update statement? (If I have an unused key range large enough, I suppose I could update all the keys to that range, and then back to my target range but the requires two updates (there are a lot of foreign keys referencing these primary keys) and requires that I have an available range, so a single update statement would be preferable.) Thanks for any enlightenment.
On 2009-04-08, Stuart McGraw <smcg2297@frii.com> wrote: > Hello all, > > I have a table with a primary key column > that contains sequential numbers. > > Sometimes I need to shift them all up or down > by a fixed amount. For example, if I have > four rows with primary keys, 2, 3, 4, 5, I > might want to shift them down by 1 by doing: > > UPDATE mytable SET id=id-1 > > (where "id" is the pk column) so that the pk's > are now 1, 2, 3, 4. > > When I try to shift them up by using +1 in the > above update statement, I get (not surprisingly) > a duplicate key error. I also realize that the > -1 case above works only by luck. > > So my question: > Is there some way, perhaps with ORDER BY, that > I can achieve the change I want with a single > update statement? (If I have an unused key > range large enough, I suppose I could update > all the keys to that range, and then back to > my target range but the requires two updates > (there are a lot of foreign keys referencing > these primary keys) and requires that I have > an available range, so a single update statement > would be preferable.) > > Thanks for any enlightenment. begin a transaction suspend the constraint (use SET CONSTRAINTS ... DEFERRED) drop the index do the update(s) recreate the index commit the transaction. I see no reason to keep the index (and its associated UNIQUE constraint) during the update, AFAICT all it does is slow the process down.
2009/4/7 Stuart McGraw <smcg2297@frii.com>: > Hello all, > > I have a table with a primary key column > that contains sequential numbers. > > Sometimes I need to shift them all up or down > by a fixed amount. For example, if I have > four rows with primary keys, 2, 3, 4, 5, I > might want to shift them down by 1 by doing: > Generally speaking, when you need to do this more than once or twice in the lifetime of your data, there's something wrong with your data model.
Jasen Betts wrote: > On 2009-04-08, Stuart McGraw <smcg2297@frii.com> wrote: >> Hello all, >> >> I have a table with a primary key column >> that contains sequential numbers. >> >> Sometimes I need to shift them all up or down >> by a fixed amount. For example, if I have >> four rows with primary keys, 2, 3, 4, 5, I >> might want to shift them down by 1 by doing: >> >> UPDATE mytable SET id=id-1 >> >> (where "id" is the pk column) so that the pk's >> are now 1, 2, 3, 4. >> >> When I try to shift them up by using +1 in the >> above update statement, I get (not surprisingly) >> a duplicate key error. I also realize that the >> -1 case above works only by luck. >> >> So my question: >> Is there some way, perhaps with ORDER BY, that >> I can achieve the change I want with a single >> update statement? (If I have an unused key >> range large enough, I suppose I could update >> all the keys to that range, and then back to >> my target range but the requires two updates >> (there are a lot of foreign keys referencing >> these primary keys) and requires that I have >> an available range, so a single update statement >> would be preferable.) >> >> Thanks for any enlightenment. > > begin a transaction > suspend the constraint (use SET CONSTRAINTS ... DEFERRED) > drop the index > do the update(s) > recreate the index > commit the transaction. > > I see no reason to keep the index (and its associated UNIQUE > constraint) during the update, AFAICT all it does is slow the process > down. Thanks for the suggestion. Unfortunately I am doing this key renumbering in an interactive gui app and as there are several million rows involved, rebuilding indexes take too long. I have continued searching for other ways to do this but most of my google results are floating in a sea of "update if insert fails, like mysql" results so I still have a little hope it is possible. I thought I remember seeing, a year or two ago, an update statement with an ordered subquery that avoided duplicate key errors but I am probably misrembering.
On Fri, Apr 10, 2009 at 11:20 AM, Scott Marlowe <scott.marlowe@gmail.com> wrote: > Generally speaking, when you need to do this more than once or twice > in the lifetime of your data, there's something wrong with your data > model. True, but there are a few non-traditional data models that would benefit from this feature. The two that come to my mind are the Celko nested set tree model, and sequenced temporal tables. -- Regards, Richard Broersma Jr. Visit the Los Angeles PostgreSQL Users Group (LAPUG) http://pugs.postgresql.org/lapug
Scott Marlowe wrote: > 2009/4/7 Stuart McGraw <smcg2297@frii.com>: >> Hello all, >> >> I have a table with a primary key column >> that contains sequential numbers. >> >> Sometimes I need to shift them all up or down >> by a fixed amount. For example, if I have >> four rows with primary keys, 2, 3, 4, 5, I >> might want to shift them down by 1 by doing: >> > > Generally speaking, when you need to do this more than once or twice > in the lifetime of your data, there's something wrong with your data > model. Generally speaking, I agree. But in this case I am not sure how to do it any better. Simplified, the database models a dictionary. An entry in the dictionary can have a number senses where each sense is a sentence that gives the meaning of the sense. The order of the senses is significant. The sense table has three columns, an fk to the dictionary entry (id number) it is part of, a sense number (small number from 1 to the number of sentences in the entry) and the sentence text. The pk is of course the entry id and the sense number. There are other tables that have fk's to the senses. I could dispense with the sentence number and use the sentence text as the second part of the composite key but the sentence text is far more volatile than the number, and can be very long making it quite unwieldy to use as a key. I guess I could add an "order"[1] column and use the sense number as a surrogate partial key to avoid the need for key renumbering, but all the api's (and the normal human way of thinking) are based on "sense number 1 of entry x", "sense number 2 of entry y", so one would need to maintain "order" as a gapless sequence (or add a new mapping layer to map from/to a arbitrary monotonic sequence to a 1,2,3,... sequence) -- the gain doesn't seem that big. Or maybe there is some other approach? [1] yes, I know "order" is a reserved word.
(JMdict? I was playing with importing that into a DB a while back, but the attributes in that XML are such a pain--and then my email died while I was trying to get those changed, and I never picked it up again.) On Mon, Apr 13, 2009 at 1:20 PM, Stuart McGraw <smcg2297@frii.com> wrote: > 1 to the number of sentences in the entry) and the sentence text. The pk is > of course the entry id and the sense number. > There are other tables that have fk's to the senses. Your PK is a composite of (entry, order)? Won't your foreign keys elsewhere all break when you shift the order around? > I guess I could add an "order"[1] column and use the sense number as a > surrogate partial key to avoid the need for key renumbering, > but all the api's (and the normal human way of thinking) are based > on "sense number 1 of entry x", "sense number 2 of entry y", so > one would need to maintain "order" as a gapless sequence (or add a new > mapping layer to map from/to a arbitrary monotonic sequence > to a 1,2,3,... sequence) -- the gain doesn't seem that big. Why not do this in the straightforward way: three separate fields: a regular, sequential PK; an FK to the entry; and an order number. Add an index on (entry_key, order_number). It's a little more expensive since you have a new column and index (the PK), but in a table with a lot of plain text that's probably insignificant. Now you can use the plain PK for your FK's. I'd anticipate other problems with changing your PKs. If you're referring to sense (100,3), and a concurrent user adds a sense, you may suddenly find yourself referring to a different sense due to them being reordered out from under you. You'd have similar problems if your sense is deleted entirely: instead of the row simply ceasing to exist (and resulting in predictable, checkable errors), you may end up silently referring to another sense. Maybe I'm misunderstanding what you're doing, though. You'd have to have no UNIQUE constraint on the (entry, order) composite index, though, or you'll have the same problem when you reorder them. -- Glenn Maynard
On Mon, Apr 13, 2009 at 2:09 PM, Glenn Maynard <glennfmaynard@gmail.com> wrote: > Your PK is a composite of (entry, order)? Won't your foreign keys > elsewhere all break when you shift the order around? If there really are foreign keys, then an update will not be allowed to shift a primary key unless the foreign key is set with ON UPDATE CASCADE then the shifts will be cascaded to all references automatically. -- Regards, Richard Broersma Jr. Visit the Los Angeles PostgreSQL Users Group (LAPUG) http://pugs.postgresql.org/lapug
On Mon, Apr 13, 2009 at 5:18 PM, Richard Broersma <richard.broersma@gmail.com> wrote: >> Your PK is a composite of (entry, order)? Won't your foreign keys >> elsewhere all break when you shift the order around? > > If there really are foreign keys, then an update will not be allowed > to shift a primary key unless the foreign key is set with ON UPDATE > CASCADE then the shifts will be cascaded to all references > automatically. Right. I'm still dusting off my SQL after a long period of corrosion. :) The same applies, though, if you have external references to the PK; for example, if you have a web interface where "/sense/10/3" refers to entry 10, sense 3. If your senses are reordered by someone else, and you operate on /10/3, you may suddenly find yourself viewing or modifying (or deleting!) a different sense. This could even happen within the same transaction, if you're not very careful with locking... -- Glenn Maynard
On Mon, Apr 13, 2009 at 2:32 PM, Glenn Maynard <glennfmaynard@gmail.com> wrote: > If your senses are reordered by someone else, and > you operate on /10/3, you may suddenly find yourself viewing or > modifying (or deleting!) a different sense. This could even happen > within the same transaction, if you're not very careful with > locking... From what I've seen, this problem can affect both surrogate and natural key designs. In both cases, care must be taken to ensure that an underling tuple hasn't been changed by any other clients before it attempts to commit its changed. Probably the most common solution is to use optimistic locking, another solution that I know of is to use serialized transaction isolation. -- Regards, Richard Broersma Jr. Visit the Los Angeles PostgreSQL Users Group (LAPUG) http://pugs.postgresql.org/lapug
On Mon, Apr 13, 2009 at 5:43 PM, Richard Broersma <richard.broersma@gmail.com> wrote: > From what I've seen, this problem can affect both surrogate and > natural key designs. In both cases, care must be taken to ensure that > an underling tuple hasn't been changed by any other clients before it > attempts to commit its changed. Probably the most common solution is > to use optimistic locking, another solution that I know of is to use > serialized transaction isolation. Right, but if you have a delayed UI, you probably don't want to hold a lock open--if the user is viewing "/100/3" and clicks "delete", you need to make sure that the one you delete is the same /100/3 that the user was viewing at the time. That's harder to do... -- Glenn Maynard
Glenn Maynard wrote: > (JMdict? yup. ;-) > I was playing with importing that into a DB a while back, > but the attributes in that XML are such a pain--and then my email died > while I was trying to get those changed, and I never picked it up > again.) > > On Mon, Apr 13, 2009 at 1:20 PM, Stuart McGraw <smcg2297@frii.com> wrote: >> 1 to the number of sentences in the entry) and the sentence text. The pk is >> of course the entry id and the sense number. >> There are other tables that have fk's to the senses. > > Your PK is a composite of (entry, order)? Won't your foreign keys > elsewhere all break when you shift the order around? As Richard Broersma pointed out, the fk's are all "cascade update". >> I guess I could add an "order"[1] column and use the sense number as a >> surrogate partial key to avoid the need for key renumbering, >> but all the api's (and the normal human way of thinking) are based >> on "sense number 1 of entry x", "sense number 2 of entry y", so >> one would need to maintain "order" as a gapless sequence (or add a new >> mapping layer to map from/to a arbitrary monotonic sequence >> to a 1,2,3,... sequence) -- the gain doesn't seem that big. > > Why not do this in the straightforward way: three separate fields: a > regular, sequential PK; an FK to the entry; and an order number. Add > an index on (entry_key, order_number). It's a little more expensive > since you have a new column and index (the PK), but in a table with a > lot of plain text that's probably insignificant. Now you can use the > plain PK for your FK's. I have done a few other simple database apps in the past and always did things that way. The current app also started that way but I tried, experimentally, using composite keys, and was amazed how much simpler things became. For example, to copy an entry one duplicates the entry's rows (in all the related tables) and need only change the entry id number in the duplicated rows. When each child table has a independent serial id, this becomes more complicated and requires looping in the copy function (AFAICT). I've noted that a number of other things I was doing in the database api and in Postgresql functions similarly became simpler. > I'd anticipate other problems with changing your PKs. If you're > referring to sense (100,3), and a concurrent user adds a sense, you > may suddenly find yourself referring to a different sense due to them > being reordered out from under you. You'd have similar problems if > your sense is deleted entirely: instead of the row simply ceasing to > exist (and resulting in predictable, checkable errors), you may end up > silently referring to another sense. I try to avoid a lot of problems like that by doing very little actual updating (in the database sense). Most changes are made to a new copy of an entry. When an update is actually done (such as to delete the original copy of an entry), it is done in a very short time interval with serializable isolation. > Maybe I'm misunderstanding what you're doing, though. > > You'd have to have no UNIQUE constraint on the (entry, order) > composite index, though, or you'll have the same problem when you > reorder them. Since the data in the database is of primary importance, and the app(s) secondary, I really would like as much constraint checking in the database as possible.
> > Date: Mon, 13 Apr 2009 17:09:49 -0400 > From: Glenn Maynard <glennfmaynard@gmail.com> > To: pgsql-sql@postgresql.org > Subject: Re: changing multiple pk's in one update > Message-ID: <d18085b50904131409g10d43d6cs35dd14ede13bc02@mail.gmail.com> > > (JMdict? I was playing with importing that into a DB a while back, > but the attributes in that XML are such a pain--and then my email died > while I was trying to get those changed, and I never picked it up > again.) > > On Mon, Apr 13, 2009 at 1:20 PM, Stuart McGraw <smcg2297@frii.com> wrote: > >> > 1 to the number of sentences in the entry) and the sentence text. Â The pk is >> > of course the entry id and the sense number. >> > There are other tables that have fk's to the senses. >> > > Your PK is a composite of (entry, order)? Won't your foreign keys > elsewhere all break when you shift the order around? > > >> > I guess I could add an "order"[1] column and use the sense number as a >> > surrogate partial key to avoid the need for key renumbering, >> > but all the api's (and the normal human way of thinking) are based >> > on "sense number 1 of entry x", "sense number 2 of entry y", so >> > one would need to maintain "order" as a gapless sequence (or add a new >> > mapping layer to map from/to a arbitrary monotonic sequence >> > to a 1,2,3,... sequence) -- the gain doesn't seem that big. >> > > Why not do this in the straightforward way: three separate fields: a > regular, sequential PK; an FK to the entry; and an order number. Add > an index on (entry_key, order_number). It's a little more expensive > since you have a new column and index (the PK), but in a table with a > lot of plain text that's probably insignificant. Now you can use the > plain PK for your FK's. > > I'd agree with this approach. I have a number of tables which are sensitive to arbitrary ordering and they sound roughly similar to your use-case (though my tables are probably smaller). My approach is to create a string column in the table which permits defining arbitrary ordering. I use a string field b/c it's easier for me to stuff (by hand) new ordered records in between other existing records. But an integer would work just as well, so long as you make sure you keep enough space between the integers (e.g. 1000, 2000, 3000). Also, if your ordered list is truly "ordinal" (each record is either 1st, 2nd, 3rd, etc in a single list) you could just use 1,2,3,4 for the ordering, but then you have to mess with two records in order to swap the positions of (say) item 2 and 3. Of course you can do this pretty easily inside a transaction, and you don't have to worry about the mess of moving PK's. Steve
On 2009-04-13, Stuart McGraw <smcg2297@frii.com> wrote: > Jasen Betts wrote: >> I see no reason to keep the index (and its associated UNIQUE >> constraint) during the update, AFAICT all it does is slow the process >> down. > > Thanks for the suggestion. > > Unfortunately I am doing this key renumbering in > an interactive gui app and as there are several million > rows involved, rebuilding indexes take too long. the update takes a long time too if it's updating all the rows. and updating the index piecewise at the same time. with the index extant I get from 20 (if the start and end ranges don't overlap) and 28s (with , to 28 seconds (maximum overlap) for a table with 1000000 (short) rows it takes 18 seconds if I first drop the index, then update, then restore the index. so, about the same amount of time or slightly faster by dropping the index. if wrap them in a transaction it takes 30s each way > I have continued searching for other ways to do this > but most of my google results are floating in a sea > of "update if insert fails, like mysql" results so > I still have a little hope it is possible. > I thought I remember seeing, a year or two ago, an > update statement with an ordered subquery that avoided > duplicate key errors but I am probably misrembering. maybe you can do it using a cursor? I've not looked at them yet.
On Wed, Apr 15, 2009 at 8:43 AM, Jasen Betts <jasen@xnet.co.nz> wrote: > the update takes a long time too if it's updating all the rows. > and updating the index piecewise at the same time. > with the index extant I get from 20 (if the start and end ranges don't overlap) and 28s (with , > to 28 seconds (maximum overlap) for a table with 1000000 (short) rows He's using a compound key as his PK, eg. (entry, order), where there are only a few order values for each entry, so I think the real case is only updating order for a specific set of entries. >> I have continued searching for other ways to do this >> but most of my google results are floating in a sea >> of "update if insert fails, like mysql" results so >> I still have a little hope it is possible. >> I thought I remember seeing, a year or two ago, an >> update statement with an ordered subquery that avoided >> duplicate key errors but I am probably misrembering. Bear in mind that the update is failing based on the order the data is in the table, not the PK order. create table test (id integer primary key); insert into test (id) values (2), (1), (3); update test set id=id+1; ERROR: duplicate key value violates unique constraint "test_pkey" update test set id=id-1; ERROR: duplicate key value violates unique constraint "test_pkey" Both fail, because it tries to update 2 first. I suppose in a real pinch, you could renumber in two steps. For example, if you wanted to delete id 6 and move everything else down: insert into test (id) values (2), (7), (3), (1), (4), (5), (6), (8), (9); begin; set transaction isolation level serializable; delete from test where id=6; update test set id = id+1000000 where id >= 6; update test set id = id-1000001 where id >= 6; commit; Not very nice, but if "id" is really a sequence number starting at 1 in your case and not an always-increasing generated regular serial (so there's no chance of it actually reaching the arbitrarily large number 1000000), it should work. (It'd probably be workable for real serials, too, with a much larger offset.) If someone else creates a new sense for that entry after the first update, it'll sit on the order number you were about to use and the operation will fail. Serialize so nobody else will insert until you're done. -- Glenn Maynard