Thread: A problem with sequences...

A problem with sequences...

From
Dmitry Tkach
Date:
Hi, everybody!

I am experiencing some weird problem that I was hoping you could shed some light on...

This is a little complicated, and, I bet the first thing I am going to hear from you is "change your schema" :-)
I will, most probably, end up doing that, but before I get into it, I would like to understand completely what is going
onhere, 
so, please, bear with me for a while.

The background is, that I have some tables, that are being populated by a java application through jdbc (don't stop
readingbecause of that - 
it is very unlikely to matter that the stuff goes through jdbc!)

The java app wants to get back the id (pk) of the rows it inserts, and it does not want to make a separate query for
that.
We use rules to achive that. For example:

create table answer
(
    id    serial primary key,
    data  text
);

create rule answer_id_seq as on insert to answer do select coalesce (new.id, last_value) as id from answer_id_seq;

(for those who wonders why I did it this way, and not using currval, I'll explain it in the end).

I was surprised to find out that at the time rule is executed, the sequence is already advanced, but the new.id is
stillnull (looks like 
the defaults just never make it into the new.* at all), but apparently, that is the case.

So, with this setup my java app is able to execute a statement that inserts a new row, get back a ResultSet and fetch
anid from it. 
And it works just fine most of the time.

*However*, every now and then (and I was never able to reproduce it on purpose, so I don't know what exactly the
circumstancesare) 
I get a weird exception from java, complaining about at attempt to insert a duplicate into answer_pkey after executing
astatement like 

insert into answer (data) values ('blah');

So, it looks like my rule somehow manages to screw up the sequence. Does it make sense to anyone?
How could it happen? I understand that this rule is no good, and I need to fix it (because it could return an incorrect
valueif 
two connections happen to insert into the same table simultaneously), but still - I want to understand how does it
manageto screw 
up that sequence just by *looking* at it???

The reason I want to understand what is going on here is to be sure that, by fixing this rule, I will really get this
problemgo away 
(it shows up pretty rarely, and I cannot reproduce it on purpose, so I have no way to verify that, other than figuring
outexactly what is 
going on).

Does all this make any sense to any of you? Can you imagine some situation when with a setup like this an insert
statementwould get 
a value from the sequence that was already used?

I would greatly appreciate any help.

Thanks a lot!

Dima.

P.S. As I promised, the reason that rule is not using currval is just that it may or may not be set for a given insert
statement- 
if the id is explicitly specified, a call to currval would fail (or even return a wrong value if the sequence was
accessedpreviously 
in the same session)...
Perhaps, I could work around that, by creating two rules - with 'where new.id is null' and 'where new.id is not null',
andhaving 
the first one use currval, and the other one just return new.id, but it just happened to be done this way...
As I said above, I understand that there are all kinds of problems with this rule, and I am going to get rid of it
eitherway (I am thinking, 
about just making a change on the java side and appending ';select currval(..)' to those inserts).
However, I would still love to understand what exactly goes on with those duplicate ids, right now.

Thanks again!


Re: A problem with sequences...

From
Richard Huxton
Date:
On Wednesday 19 Feb 2003 11:05 pm, Dmitry Tkach wrote:
> create table answer
> (
>     id    serial primary key,
>     data  text
> );
>
> create rule answer_id_seq as on insert to answer do select coalesce
> (new.id, last_value) as id from answer_id_seq;

> *However*, every now and then (and I was never able to reproduce it on
> purpose, so I don't know what exactly the circumstances are) I get a weird
> exception from java, complaining about at attempt to insert a duplicate
> into answer_pkey after executing a statement like
>
> insert into answer (data) values ('blah');

> P.S. As I promised, the reason that rule is not using currval is just that
> it may or may not be set for a given insert statement - if the id is
> explicitly specified, a call to currval would fail (or even return a wrong
  ^^^^^^^^^^^^^^^^^^

Before looking into more obscure possibilities, are you using the sequence to
generate these explicit id's? If not, that's why you're getting duplicates,
the sequence generator doesn't know you've used these numbers.

> value if the sequence was accessed previously in the same session)...
> Perhaps, I could work around that, by creating two rules - with 'where
> new.id is null' and 'where new.id is not null', and having the first one
> use currval, and the other one just return new.id, but it just happened to
> be done this way... As I said above, I understand that there are all kinds
> of problems with this rule, and I am going to get rid of it either way (I
> am thinking, about just making a change on the java side and appending
> ';select currval(..)' to those inserts). However, I would still love to
> understand what exactly goes on with those duplicate ids, right now.

--
  Richard Huxton

Re: A problem with sequences...

From
Dima Tkach
Date:
>
>
>Before looking into more obscure possibilities, are you using the sequence to
>generate these explicit id's? If not, that's why you're getting duplicates,
>the sequence generator doesn't know you've used these numbers.
>
>
>
Yeah... I understand this.
In this case, the id is *never* specified explicitly. Java app either
knows the id or it does not.
If it knows it, it does the update, otherwise, it does an insert, with
*no* id specified, and gets the new id back from that rule, so that time
it will
know the id and end up doing update...

I was referring to the situation in general when somehow (like data
migration) the id is specified, I just don't want that rule to barf. It
is definitely not what's causing my problem. So, let's get into those
'obscure possibilities' now :-)

Thanks!

Dima.




Re: A problem with sequences...

From
Richard Huxton
Date:
On Thursday 20 Feb 2003 2:56 pm, Dima Tkach wrote:
> >Before looking into more obscure possibilities, are you using the sequence
> > to generate these explicit id's? If not, that's why you're getting
> > duplicates, the sequence generator doesn't know you've used these
> > numbers.
>
> Yeah... I understand this.
> In this case, the id is *never* specified explicitly. Java app either
> knows the id or it does not.
> If it knows it, it does the update, otherwise, it does an insert, with
> *no* id specified, and gets the new id back from that rule, so that time
> it will
> know the id and end up doing update...
>
> I was referring to the situation in general when somehow (like data
> migration) the id is specified, I just don't want that rule to barf. It
> is definitely not what's causing my problem. So, let's get into those
> 'obscure possibilities' now :-)

Well, in that case you need to setval() the sequence to something bigger than
any used numbers after the import.

OK - let's look at the rule:

create table answer
(
    id    serial primary key,
    data  text
);

create rule answer_id_seq as on insert to answer do select coalesce (new.id,
last_value) as id from answer_id_seq;

Well - you're going to have problems if you do something like:

INSERT INTO answer (data) (SELECT d FROM foo);

I daresay you're not, but something to bear in mind.

To see why you're getting problems with duplicate ID numbers, open two psql
windows and do:

1> SELECT nextval('answer_id_seq');
2> SELECT nextval('answer_id_seq');
1> SELECT last_value FROM answer_id_seq;
2> SELECT last_value FROM answer_id_seq;

As you'll see, last_value isn't concurrent-safe like currval() and nextval()
are.

So - if you want to keep your rule, you'll want to rewrite it to use currval()
as you mentioned.

Personally, I'd write a function to insert into the table and make the app use
that, or create a view and have the app insert via that. Getting a result-set
back from an insert would spook me if I wasn't expecting it.
--
  Richard Huxton

Re: A problem with sequences...

From
Dmitry Tkach
Date:
>
>
>Well, in that case you need to setval() the sequence to something bigger than
>any used numbers after the import.
>
>
I know. (And I do).

>OK - let's look at the rule:
>
>create table answer
>(
>    id    serial primary key,
>    data  text
>);
>
>create rule answer_id_seq as on insert to answer do select coalesce (new.id,
>last_value) as id from answer_id_seq;
>
>Well - you're going to have problems if you do something like:
>
>INSERT INTO answer (data) (SELECT d FROM foo);
>
>I daresay you're not, but something to bear in mind.
>
That's fine. The problen in that case would be that I'll be always
getting back the last id inserted, right.
I've seen this happenning, and it is OK, because I only need that output
in this java app, and the java app only inserts them
one at a time.
(In fact, the most annoying thing with statement like that is having to
scroll through all of that useless output if there are too many rows in
foo - so, I usually just disable the rule whenever I am about to do
something like that)....

>
>To see why you're getting problems with duplicate ID numbers, open two psql
>windows and do:
>
>1> SELECT nextval('answer_id_seq');
>2> SELECT nextval('answer_id_seq');
>1> SELECT last_value FROM answer_id_seq;
>2> SELECT last_value FROM answer_id_seq;
>
>As you'll see, last_value isn't concurrent-safe like currval() and nextval()
>are.
>
Well... yeah. I know this. That's why I said in the very beginning, that
this rule is no good.
*However* this would only exp[lain a situation with my java app getting
an incorrect id back after inserting a row, but it never uses that id to
insert other rows (if it did, it would cause a duplication even if the
rule was safe), so, I still don't see any reason how this would cause a
duplicated id to be inserted - nextval() should still return unique
numbers, right? Once again, I know that last_val may not be the same
thing nextval() just returned, but the real question is how can this
possibly cause a duplicated id to be genrated???

>
>So - if you want to keep your rule, you'll want to rewrite it to use currval()
>as you mentioned.
>
>
Nah... I guess, I'll rather do something in the java (append ";select
currval..." to the insert statement).

>Personally, I'd write a function to insert into the table and make the app use
>that, or create a view and have the app insert via that. Getting a result-set
>back from an insert would spook me if I wasn't expecting it.
>
Well... If you do not expect it, you can just ignore it - it's not a big
deal :-)
I don't see how having a view would help...

As for the function, I would have to create many of them - one for each
table I need to insert into, because the arguments would be different,
and with the rule I was able to get away with only writing it once ...
If postgres had something like (void *) to pass to a function (and also
if it allowed variable number of parameters), I would ceratinly stick
with the function solution, because then I would also be able to cache a
query plan...

Dima





Re: A problem with sequences...

From
Dmitry Tkach
Date:
I got it!

For those, who are still wonderring, the actual problem had nothing to do with that rule.

I was loading some stuff into the database through a sql script (copy from etc...), and in the end of it I did:

select setval('answer_id_seq', id) from answer order by id desc limit 1;

Now, for some reason this reports a correct value, but what actually gets set is wrong!
Here is an example:

rapidb=# select max(id) from answer;
   max
-------
  25000
(1 row)

rapidb=# select last_value from answer_id_seq;
  last_value
------------
       22124
(1 row)

rapidb=# select setval ('answer_id_seq', id) from answer order by id desc limit 1;
  setval
--------
   25000
(1 row)

rapidb=# select currval('answer_id_seq');
  currval
---------
    21452
(1 row)

So, the question I have now is - what's going on?
How come setval() reports 25000, but the currval changes to 21452
If my query is wrong (and I can't really see what's wrong with it), then why does it report the correct value?

I have rewritten that script to do
select setval ('answer_id_seq', max(id)) from answer
instead, and that works fine... But I'd still love to find out what is wrong with that original query's behaviour...

Thanks!

Dima


Re: A problem with sequences...

From
Stephan Szabo
Date:
On Fri, 21 Feb 2003, Dmitry Tkach wrote:

> rapidb=# select last_value from answer_id_seq;
>   last_value
> ------------
>        22124
> (1 row)
>
> rapidb=# select setval ('answer_id_seq', id) from answer order by id desc limit 1;
>   setval
> --------
>    25000
> (1 row)
>
> rapidb=# select currval('answer_id_seq');
>   currval
> ---------
>     21452
> (1 row)
>
> So, the question I have now is - what's going on?
> How come setval() reports 25000, but the currval changes to 21452
> If my query is wrong (and I can't really see what's wrong with it), then why does it report the correct value?

IIRC, currval returns the last value given by the sequence (in nextval) to
this session.  What does a select nextval('answer_id_seq') give you?
21453 or 25001?


Re: A problem with sequences...

From
Doug McNaught
Date:
Dmitry Tkach <dmitry@openratings.com> writes:

> select setval('answer_id_seq', id) from answer order by id desc limit 1;
>
> Now, for some reason this reports a correct value, but what actually
> gets set is wrong!

I'm guessing that 'setval' is getting called more than once here.
Your 'LIMIT 1' controls how many rows are returned to the client, but
the server is probably generating more rows internally.  So this is
just wrong, and

> select setval ('answer_id_seq', max(id)) from answer

is right.

-Doug

Re: A problem with sequences...

From
Dmitry Tkach
Date:
Stephan Szabo wrote:

>IIRC, currval returns the last value given by the sequence (in nextval) to
>this session.  What does a select nextval('answer_id_seq') give you?
>21453 or 25001?
>
>
2143


Dima


Re: A problem with sequences...

From
Dmitry Tkach
Date:
Doug McNaught wrote:

>Dmitry Tkach <dmitry@openratings.com> writes:
>
>
>
>>select setval('answer_id_seq', id) from answer order by id desc limit 1;
>>
>>Now, for some reason this reports a correct value, but what actually
>>gets set is wrong!
>>
>>
>
>I'm guessing that 'setval' is getting called more than once here.
>Your 'LIMIT 1' controls how many rows are returned to the client, but
>the server is probably generating more rows internally.  So this is
>just wrong, and
>
>
You are right ! That's it! It does get called twice.
There is even a nice comment in ExecLimit():

       * NOTE: when scanning forwards, we must fetch one tuple beyond the
       * COUNT limit before we can return NULL, else the subplan won't
       * be properly positioned to start going backwards.  Hence test
       * here is for position > netlimit not position >= netlimit.

Whatever that means, that's what was causing my problems...

Thanks!

Dima





Re: A problem with sequences...

From
Stephan Szabo
Date:
On 21 Feb 2003, Doug McNaught wrote:

> Dmitry Tkach <dmitry@openratings.com> writes:
>
> > select setval('answer_id_seq', id) from answer order by id desc limit 1;
> >
> > Now, for some reason this reports a correct value, but what actually
> > gets set is wrong!
>
> I'm guessing that 'setval' is getting called more than once here.
> Your 'LIMIT 1' controls how many rows are returned to the client, but
> the server is probably generating more rows internally.  So this is

I'd totally missed that, it's probably doing a plan of seqscan + sort to
run the query.

> > select setval ('answer_id_seq', max(id)) from answer
>
> is right.

Or possibly:
select setval('answer_id_seq', id) from
 (select id from answer order by id desc limit 1) as foo;

which might if the table gets big enough use an index scan.



Re: A problem with sequences...

From
Tom Lane
Date:
Dmitry Tkach <dmitry@openratings.com> writes:
> Doug McNaught wrote:
>> I'm guessing that 'setval' is getting called more than once here.
>> Your 'LIMIT 1' controls how many rows are returned to the client, but
>> the server is probably generating more rows internally.
>>
> You are right ! That's it! It does get called twice.
> There is even a nice comment in ExecLimit():

BTW, in CVS tip ExecLimit has been rewritten to not do this, so the
query will behave as you expect in 7.4.  Still, functions with
side-effects are really really dangerous in any but the simplest kind
of SELECT, because the planner is pretty cavalier about rearranging
things.  I'd advise doing this instead:

select setval('answer_id_seq', (select id from answer
                                order by id desc limit 1));

Here, you *know* that the setval will be called exactly once.

            regards, tom lane