Thread: A problem with sequences...
Hi, everybody! I am experiencing some weird problem that I was hoping you could shed some light on... This is a little complicated, and, I bet the first thing I am going to hear from you is "change your schema" :-) I will, most probably, end up doing that, but before I get into it, I would like to understand completely what is going onhere, so, please, bear with me for a while. The background is, that I have some tables, that are being populated by a java application through jdbc (don't stop readingbecause of that - it is very unlikely to matter that the stuff goes through jdbc!) The java app wants to get back the id (pk) of the rows it inserts, and it does not want to make a separate query for that. We use rules to achive that. For example: create table answer ( id serial primary key, data text ); create rule answer_id_seq as on insert to answer do select coalesce (new.id, last_value) as id from answer_id_seq; (for those who wonders why I did it this way, and not using currval, I'll explain it in the end). I was surprised to find out that at the time rule is executed, the sequence is already advanced, but the new.id is stillnull (looks like the defaults just never make it into the new.* at all), but apparently, that is the case. So, with this setup my java app is able to execute a statement that inserts a new row, get back a ResultSet and fetch anid from it. And it works just fine most of the time. *However*, every now and then (and I was never able to reproduce it on purpose, so I don't know what exactly the circumstancesare) I get a weird exception from java, complaining about at attempt to insert a duplicate into answer_pkey after executing astatement like insert into answer (data) values ('blah'); So, it looks like my rule somehow manages to screw up the sequence. Does it make sense to anyone? How could it happen? I understand that this rule is no good, and I need to fix it (because it could return an incorrect valueif two connections happen to insert into the same table simultaneously), but still - I want to understand how does it manageto screw up that sequence just by *looking* at it??? The reason I want to understand what is going on here is to be sure that, by fixing this rule, I will really get this problemgo away (it shows up pretty rarely, and I cannot reproduce it on purpose, so I have no way to verify that, other than figuring outexactly what is going on). Does all this make any sense to any of you? Can you imagine some situation when with a setup like this an insert statementwould get a value from the sequence that was already used? I would greatly appreciate any help. Thanks a lot! Dima. P.S. As I promised, the reason that rule is not using currval is just that it may or may not be set for a given insert statement- if the id is explicitly specified, a call to currval would fail (or even return a wrong value if the sequence was accessedpreviously in the same session)... Perhaps, I could work around that, by creating two rules - with 'where new.id is null' and 'where new.id is not null', andhaving the first one use currval, and the other one just return new.id, but it just happened to be done this way... As I said above, I understand that there are all kinds of problems with this rule, and I am going to get rid of it eitherway (I am thinking, about just making a change on the java side and appending ';select currval(..)' to those inserts). However, I would still love to understand what exactly goes on with those duplicate ids, right now. Thanks again!
On Wednesday 19 Feb 2003 11:05 pm, Dmitry Tkach wrote: > create table answer > ( > id serial primary key, > data text > ); > > create rule answer_id_seq as on insert to answer do select coalesce > (new.id, last_value) as id from answer_id_seq; > *However*, every now and then (and I was never able to reproduce it on > purpose, so I don't know what exactly the circumstances are) I get a weird > exception from java, complaining about at attempt to insert a duplicate > into answer_pkey after executing a statement like > > insert into answer (data) values ('blah'); > P.S. As I promised, the reason that rule is not using currval is just that > it may or may not be set for a given insert statement - if the id is > explicitly specified, a call to currval would fail (or even return a wrong ^^^^^^^^^^^^^^^^^^ Before looking into more obscure possibilities, are you using the sequence to generate these explicit id's? If not, that's why you're getting duplicates, the sequence generator doesn't know you've used these numbers. > value if the sequence was accessed previously in the same session)... > Perhaps, I could work around that, by creating two rules - with 'where > new.id is null' and 'where new.id is not null', and having the first one > use currval, and the other one just return new.id, but it just happened to > be done this way... As I said above, I understand that there are all kinds > of problems with this rule, and I am going to get rid of it either way (I > am thinking, about just making a change on the java side and appending > ';select currval(..)' to those inserts). However, I would still love to > understand what exactly goes on with those duplicate ids, right now. -- Richard Huxton
> > >Before looking into more obscure possibilities, are you using the sequence to >generate these explicit id's? If not, that's why you're getting duplicates, >the sequence generator doesn't know you've used these numbers. > > > Yeah... I understand this. In this case, the id is *never* specified explicitly. Java app either knows the id or it does not. If it knows it, it does the update, otherwise, it does an insert, with *no* id specified, and gets the new id back from that rule, so that time it will know the id and end up doing update... I was referring to the situation in general when somehow (like data migration) the id is specified, I just don't want that rule to barf. It is definitely not what's causing my problem. So, let's get into those 'obscure possibilities' now :-) Thanks! Dima.
On Thursday 20 Feb 2003 2:56 pm, Dima Tkach wrote: > >Before looking into more obscure possibilities, are you using the sequence > > to generate these explicit id's? If not, that's why you're getting > > duplicates, the sequence generator doesn't know you've used these > > numbers. > > Yeah... I understand this. > In this case, the id is *never* specified explicitly. Java app either > knows the id or it does not. > If it knows it, it does the update, otherwise, it does an insert, with > *no* id specified, and gets the new id back from that rule, so that time > it will > know the id and end up doing update... > > I was referring to the situation in general when somehow (like data > migration) the id is specified, I just don't want that rule to barf. It > is definitely not what's causing my problem. So, let's get into those > 'obscure possibilities' now :-) Well, in that case you need to setval() the sequence to something bigger than any used numbers after the import. OK - let's look at the rule: create table answer ( id serial primary key, data text ); create rule answer_id_seq as on insert to answer do select coalesce (new.id, last_value) as id from answer_id_seq; Well - you're going to have problems if you do something like: INSERT INTO answer (data) (SELECT d FROM foo); I daresay you're not, but something to bear in mind. To see why you're getting problems with duplicate ID numbers, open two psql windows and do: 1> SELECT nextval('answer_id_seq'); 2> SELECT nextval('answer_id_seq'); 1> SELECT last_value FROM answer_id_seq; 2> SELECT last_value FROM answer_id_seq; As you'll see, last_value isn't concurrent-safe like currval() and nextval() are. So - if you want to keep your rule, you'll want to rewrite it to use currval() as you mentioned. Personally, I'd write a function to insert into the table and make the app use that, or create a view and have the app insert via that. Getting a result-set back from an insert would spook me if I wasn't expecting it. -- Richard Huxton
> > >Well, in that case you need to setval() the sequence to something bigger than >any used numbers after the import. > > I know. (And I do). >OK - let's look at the rule: > >create table answer >( > id serial primary key, > data text >); > >create rule answer_id_seq as on insert to answer do select coalesce (new.id, >last_value) as id from answer_id_seq; > >Well - you're going to have problems if you do something like: > >INSERT INTO answer (data) (SELECT d FROM foo); > >I daresay you're not, but something to bear in mind. > That's fine. The problen in that case would be that I'll be always getting back the last id inserted, right. I've seen this happenning, and it is OK, because I only need that output in this java app, and the java app only inserts them one at a time. (In fact, the most annoying thing with statement like that is having to scroll through all of that useless output if there are too many rows in foo - so, I usually just disable the rule whenever I am about to do something like that).... > >To see why you're getting problems with duplicate ID numbers, open two psql >windows and do: > >1> SELECT nextval('answer_id_seq'); >2> SELECT nextval('answer_id_seq'); >1> SELECT last_value FROM answer_id_seq; >2> SELECT last_value FROM answer_id_seq; > >As you'll see, last_value isn't concurrent-safe like currval() and nextval() >are. > Well... yeah. I know this. That's why I said in the very beginning, that this rule is no good. *However* this would only exp[lain a situation with my java app getting an incorrect id back after inserting a row, but it never uses that id to insert other rows (if it did, it would cause a duplication even if the rule was safe), so, I still don't see any reason how this would cause a duplicated id to be inserted - nextval() should still return unique numbers, right? Once again, I know that last_val may not be the same thing nextval() just returned, but the real question is how can this possibly cause a duplicated id to be genrated??? > >So - if you want to keep your rule, you'll want to rewrite it to use currval() >as you mentioned. > > Nah... I guess, I'll rather do something in the java (append ";select currval..." to the insert statement). >Personally, I'd write a function to insert into the table and make the app use >that, or create a view and have the app insert via that. Getting a result-set >back from an insert would spook me if I wasn't expecting it. > Well... If you do not expect it, you can just ignore it - it's not a big deal :-) I don't see how having a view would help... As for the function, I would have to create many of them - one for each table I need to insert into, because the arguments would be different, and with the rule I was able to get away with only writing it once ... If postgres had something like (void *) to pass to a function (and also if it allowed variable number of parameters), I would ceratinly stick with the function solution, because then I would also be able to cache a query plan... Dima
I got it! For those, who are still wonderring, the actual problem had nothing to do with that rule. I was loading some stuff into the database through a sql script (copy from etc...), and in the end of it I did: select setval('answer_id_seq', id) from answer order by id desc limit 1; Now, for some reason this reports a correct value, but what actually gets set is wrong! Here is an example: rapidb=# select max(id) from answer; max ------- 25000 (1 row) rapidb=# select last_value from answer_id_seq; last_value ------------ 22124 (1 row) rapidb=# select setval ('answer_id_seq', id) from answer order by id desc limit 1; setval -------- 25000 (1 row) rapidb=# select currval('answer_id_seq'); currval --------- 21452 (1 row) So, the question I have now is - what's going on? How come setval() reports 25000, but the currval changes to 21452 If my query is wrong (and I can't really see what's wrong with it), then why does it report the correct value? I have rewritten that script to do select setval ('answer_id_seq', max(id)) from answer instead, and that works fine... But I'd still love to find out what is wrong with that original query's behaviour... Thanks! Dima
On Fri, 21 Feb 2003, Dmitry Tkach wrote: > rapidb=# select last_value from answer_id_seq; > last_value > ------------ > 22124 > (1 row) > > rapidb=# select setval ('answer_id_seq', id) from answer order by id desc limit 1; > setval > -------- > 25000 > (1 row) > > rapidb=# select currval('answer_id_seq'); > currval > --------- > 21452 > (1 row) > > So, the question I have now is - what's going on? > How come setval() reports 25000, but the currval changes to 21452 > If my query is wrong (and I can't really see what's wrong with it), then why does it report the correct value? IIRC, currval returns the last value given by the sequence (in nextval) to this session. What does a select nextval('answer_id_seq') give you? 21453 or 25001?
Dmitry Tkach <dmitry@openratings.com> writes: > select setval('answer_id_seq', id) from answer order by id desc limit 1; > > Now, for some reason this reports a correct value, but what actually > gets set is wrong! I'm guessing that 'setval' is getting called more than once here. Your 'LIMIT 1' controls how many rows are returned to the client, but the server is probably generating more rows internally. So this is just wrong, and > select setval ('answer_id_seq', max(id)) from answer is right. -Doug
Stephan Szabo wrote: >IIRC, currval returns the last value given by the sequence (in nextval) to >this session. What does a select nextval('answer_id_seq') give you? >21453 or 25001? > > 2143 Dima
Doug McNaught wrote: >Dmitry Tkach <dmitry@openratings.com> writes: > > > >>select setval('answer_id_seq', id) from answer order by id desc limit 1; >> >>Now, for some reason this reports a correct value, but what actually >>gets set is wrong! >> >> > >I'm guessing that 'setval' is getting called more than once here. >Your 'LIMIT 1' controls how many rows are returned to the client, but >the server is probably generating more rows internally. So this is >just wrong, and > > You are right ! That's it! It does get called twice. There is even a nice comment in ExecLimit(): * NOTE: when scanning forwards, we must fetch one tuple beyond the * COUNT limit before we can return NULL, else the subplan won't * be properly positioned to start going backwards. Hence test * here is for position > netlimit not position >= netlimit. Whatever that means, that's what was causing my problems... Thanks! Dima
On 21 Feb 2003, Doug McNaught wrote: > Dmitry Tkach <dmitry@openratings.com> writes: > > > select setval('answer_id_seq', id) from answer order by id desc limit 1; > > > > Now, for some reason this reports a correct value, but what actually > > gets set is wrong! > > I'm guessing that 'setval' is getting called more than once here. > Your 'LIMIT 1' controls how many rows are returned to the client, but > the server is probably generating more rows internally. So this is I'd totally missed that, it's probably doing a plan of seqscan + sort to run the query. > > select setval ('answer_id_seq', max(id)) from answer > > is right. Or possibly: select setval('answer_id_seq', id) from (select id from answer order by id desc limit 1) as foo; which might if the table gets big enough use an index scan.
Dmitry Tkach <dmitry@openratings.com> writes: > Doug McNaught wrote: >> I'm guessing that 'setval' is getting called more than once here. >> Your 'LIMIT 1' controls how many rows are returned to the client, but >> the server is probably generating more rows internally. >> > You are right ! That's it! It does get called twice. > There is even a nice comment in ExecLimit(): BTW, in CVS tip ExecLimit has been rewritten to not do this, so the query will behave as you expect in 7.4. Still, functions with side-effects are really really dangerous in any but the simplest kind of SELECT, because the planner is pretty cavalier about rearranging things. I'd advise doing this instead: select setval('answer_id_seq', (select id from answer order by id desc limit 1)); Here, you *know* that the setval will be called exactly once. regards, tom lane