Thread: pg_dump and sequences - RFC

pg_dump and sequences - RFC

From
Philip Warner
Date:
It recently came to my attention that pg_dump dumps 'CREATE SEQUENCE' and
'SELECT NEXTVAL' commands for both data-only and schema-only output. This
results in problems for users who do the two in separate steps, and seems a
little odd.

Also, I'd be interested to know what the purpose of 'SELECT NEXTVAL' is?

My inclinations is do do the following:

- Issue 'CREATE SEQUENCE...Initial Value 1...' in OID order
- Issue 'SELECT SETVAL...' at end of data load.

This means that a schema-only restore will hgave all sequences set up with
initial value = 1, and a data-only restore will have sequences set
'correctly'.

Does this sound reasonable?


----------------------------------------------------------------
Philip Warner                    |     __---_____
Albatross Consulting Pty. Ltd.   |----/       -  \
(A.B.N. 75 008 659 498)          |          /(@)   ______---_
Tel: (+61) 0500 83 82 81         |                 _________  \
Fax: (+61) 0500 83 82 82         |                 ___________ |
Http://www.rhyme.com.au          |                /           \|                                |    --________--
PGP key available upon request,  |  /
and from pgp5.ai.mit.edu:11371   |/


Re: pg_dump and sequences - RFC

From
Tom Lane
Date:
Philip Warner <pjw@rhyme.com.au> writes:
> My inclinations is do do the following:

> - Issue 'CREATE SEQUENCE...Initial Value 1...' in OID order
> - Issue 'SELECT SETVAL...' at end of data load.

> This means that a schema-only restore will hgave all sequences set up with
> initial value = 1, and a data-only restore will have sequences set
> 'correctly'.

Seems reasonable, except you should not necessarily use 1; that could
be outside the defined range of the sequence object.  Use its min_value
instead.

It's too bad the sequence object doesn't save the original starting
value, which is what the schema-only restore REALLY should restore.
The min_value is probably close enough for practical purposes ... not
sure that it's worth adding an original_value column just for this.
(It'd be a simple enough change in terms of the code, but I wonder if
it might create compatibility problems for applications that look at
the contents of sequences.)


> Also, I'd be interested to know what the purpose of 'SELECT NEXTVAL' is?

IIRC the point of the nextval() is to ensure that the internal state of
the sequence is correct.  There's a bool "is_called" in the sequence
that means something like "I've been nextval()'d at least once", and the
only clean way to make that become set is to issue a nextval.  You can
watch the behavior by doing "select * from sequenceobject" between
sequence commands --- it looks like the first nextval() simply sets
is_called without changing last_value, and then subsequent nextval()s
increment last_value.  (This peculiar arrangement makes it possible
to have a starting value equal to MININT, should you want to do so.)
So pg_dump needs to make sure it restores the correct setting of both
fields.

This is pretty grotty because it looks like there's no way to clear
is_called again, short of dropping and recreating the sequence.
So unless you want to do that always, a data-only restore couldn't
guarantee to restore the state of a virgin sequence.
        regards, tom lane


Re: pg_dump and sequences - RFC

From
Philip Warner
Date:
At 10:36 28/09/00 -0400, Tom Lane wrote:
>Philip Warner <pjw@rhyme.com.au> writes:
>> My inclinations is do do the following:
>
>> - Issue 'CREATE SEQUENCE...Initial Value 1...' in OID order
>> - Issue 'SELECT SETVAL...' at end of data load.
>
>Seems reasonable, except you should not necessarily use 1; that could
>be outside the defined range of the sequence object.  Use its min_value
>instead.

OK. Given the discussion of 'select nextval', do you know if 'select
setval' will set the is_called flag? If not should I:


Issue 'CREATE SEQUENCE...Initial Value <MINVAL>...' in OID order

if (is_called was set AND we've loaded any data) then
   Issue 'SELECT NEXTVAL...' at end of data load, and *before* setval.   Issue 'SELECT SETVAL...' at end of data load.

endif


----------------------------------------------------------------
Philip Warner                    |     __---_____
Albatross Consulting Pty. Ltd.   |----/       -  \
(A.B.N. 75 008 659 498)          |          /(@)   ______---_
Tel: (+61) 0500 83 82 81         |                 _________  \
Fax: (+61) 0500 83 82 82         |                 ___________ |
Http://www.rhyme.com.au          |                /           \|                                |    --________--
PGP key available upon request,  |  /
and from pgp5.ai.mit.edu:11371   |/


Re: pg_dump and sequences - RFC

From
Tom Lane
Date:
Philip Warner <pjw@rhyme.com.au> writes:
> OK. Given the discussion of 'select nextval', do you know if 'select
> setval' will set the is_called flag?

Looks like it does, both by experiment and by reading the code.
So if you issue a setval() you don't need a nextval() as well.

However you still have the problem that you can't recreate the
state of a virgin (never-nextval'd) sequence this way.  The
existing pg_dump code is correct, in that it will reproduce the
state of a sequence whether virgin or not.  A data-only reload
would fail to make that guarantee unless you drop and recreate
the sequence.
        regards, tom lane


Re: pg_dump and sequences - RFC

From
Philip Warner
Date:
At 11:01 28/09/00 -0400, Tom Lane wrote:
>A data-only reload
>would fail to make that guarantee unless you drop and recreate
>the sequence.

Will this cause problems in an existing database because the sequence OID
changes?


----------------------------------------------------------------
Philip Warner                    |     __---_____
Albatross Consulting Pty. Ltd.   |----/       -  \
(A.B.N. 75 008 659 498)          |          /(@)   ______---_
Tel: (+61) 0500 83 82 81         |                 _________  \
Fax: (+61) 0500 83 82 82         |                 ___________ |
Http://www.rhyme.com.au          |                /           \|                                |    --________--
PGP key available upon request,  |  /
and from pgp5.ai.mit.edu:11371   |/


Re: pg_dump and sequences - RFC

From
Tom Lane
Date:
Philip Warner <pjw@rhyme.com.au> writes:
> At 11:01 28/09/00 -0400, Tom Lane wrote:
>> A data-only reload
>> would fail to make that guarantee unless you drop and recreate
>> the sequence.

> Will this cause problems in an existing database because the sequence OID
> changes?

Hmm, good point.  There isn't any real easy way to refer to a sequence
by OID --- the sequence functions only accept names --- but I suppose
someone out there might be doing something with sequence OIDs.

Perhaps the real answer is to extend the set of sequence functions so
that it's possible to set/clear is_called directly.  Perhaps a variant
setval() with an additional, boolean argument?
        regards, tom lane


Re: pg_dump and sequences - RFC

From
Philip Warner
Date:
At 11:17 28/09/00 -0400, Tom Lane wrote:
>
>Hmm, good point.  There isn't any real easy way to refer to a sequence
>by OID --- the sequence functions only accept names --- but I suppose
>someone out there might be doing something with sequence OIDs.

So long as the backend & metadata don't rely on the OID, then it's 99.9%
safe, I'd guess. I'd be happy to go with this, and do a function later
if/when necessary (see below).


>Perhaps the real answer is to extend the set of sequence functions so
>that it's possible to set/clear is_called directly.  Perhaps a variant
>setval() with an additional, boolean argument?

This would be something I'd like to do as a learning exercise. However,
aren't we 2 days from beta? Is this enough time to learn how to add a
function to the backend?


----------------------------------------------------------------
Philip Warner                    |     __---_____
Albatross Consulting Pty. Ltd.   |----/       -  \
(A.B.N. 75 008 659 498)          |          /(@)   ______---_
Tel: (+61) 0500 83 82 81         |                 _________  \
Fax: (+61) 0500 83 82 82         |                 ___________ |
Http://www.rhyme.com.au          |                /           \|                                |    --________--
PGP key available upon request,  |  /
and from pgp5.ai.mit.edu:11371   |/


Re: pg_dump and sequences - RFC

From
Tom Lane
Date:
Philip Warner <pjw@rhyme.com.au> writes:
> This would be something I'd like to do as a learning exercise. However,
> aren't we 2 days from beta? Is this enough time to learn how to add a
> function to the backend?

In practice, you've probably got a week.  I believe Marc is planning to
be out of town for a week starting tomorrow, and he's not going to be
pushing out a beta till he gets back.

(Besides, I'm not quite done with subselect-in-FROM ;-))

I'd recommend going for the function.
        regards, tom lane