Thread: pg_dump future problem.

pg_dump future problem.

From
Christopher Kings-Lynne
Date:
I see an obvious problem in the way pg_dump dumps serials.  At the moment,
we have length 64 identifiers.  If someone has a 64 character table name
and adds a serial to it, it will get a truncated sequence name, with _seq
o the end.

pg_dump will dump it like this:

CREATE TABLE really_long_name (a SERIAL UNIQUE
);

SELECT SETVAL('really_long_na_seq', 120);

However, if we up 7.4, say, to use 128 character identifiers, then this
restore will fail, as the sequence name will NOT be truncated and the
setval() call will fail.

We really need:

ALTER SEQUENCE ON really_long_name(a) ....

Chris



Re: pg_dump future problem.

From
Philip Warner
Date:
At 12:14 PM 4/05/2003 +0800, Christopher Kings-Lynne wrote:
>CREATE TABLE really_long_name (
>         a SERIAL UNIQUE
>);

At least in production versions, pg_dump does not do this. It will dump it as   a integer default
nextval('really_long_na_seq')
)

not sure about CVS.



----------------------------------------------------------------
Philip Warner                    |     __---_____
Albatross Consulting Pty. Ltd.   |----/       -  \
(A.B.N. 75 008 659 498)          |          /(@)   ______---_
Tel: (+61) 0500 83 82 81         |                 _________  \
Fax: (+61) 03 5330 3172          |                 ___________ |
Http://www.rhyme.com.au          |                /           \|                                 |    --________--
PGP key available upon request,  |  /
and from pgp5.ai.mit.edu:11371   |/



Re: pg_dump future problem.

From
Christopher Kings-Lynne
Date:
> At 12:14 PM 4/05/2003 +0800, Christopher Kings-Lynne wrote:
> >CREATE TABLE really_long_name (
> >         a SERIAL UNIQUE
> >);
>
> At least in production versions, pg_dump does not do this. It will dump it as
>     a integer default nextval('really_long_na_seq')
> )

No, 7.3 release dumps tables like this.

Assuming that you've use contrib/adddepend to add the dependencies from
your old tables into the dependencies table.  (Which you should do...)

Even if you don't, then any tables you create under 7.3 will certainly be
dumped like that...

Chris



Re: pg_dump future problem.

From
Philip Warner
Date:
At 01:49 PM 4/05/2003 +0800, Christopher Kings-Lynne wrote:
>No, 7.3 release dumps tables like this.

Good point. I had not even noticed, but I had noticed something odd in the 
last few dumps.

My preference for a solution to this problem would be to remove even more 
implementation knowledge from pg_dump, and add a command like:
    alter table really_long_name set next a = <next sequence value>;

Assuming this kind of approach is acceptable, I'm even volunteering to do 
it since it seems small and hits a few areas I have not had a chance to 
break yet. Unless of course someone else is already in the relevant areas.

It has the advantage of bypassing the problem and, if we ever redesign 
sequences, there is no impact on pg_dump.







----------------------------------------------------------------
Philip Warner                    |     __---_____
Albatross Consulting Pty. Ltd.   |----/       -  \
(A.B.N. 75 008 659 498)          |          /(@)   ______---_
Tel: (+61) 0500 83 82 81         |                 _________  \
Fax: (+61) 03 5330 3172          |                 ___________ |
Http://www.rhyme.com.au          |                /           \|                                 |    --________--
PGP key available upon request,  |  /
and from pgp5.ai.mit.edu:11371   |/



Re: pg_dump future problem.

From
Tom Lane
Date:
Christopher Kings-Lynne <chriskl@familyhealth.com.au> writes:
> I see an obvious problem in the way pg_dump dumps serials.  At the moment,
> we have length 64 identifiers.  If someone has a 64 character table name
> and adds a serial to it, it will get a truncated sequence name, with _seq
> o the end.
> However, if we up 7.4, say, to use 128 character identifiers, then this
> restore will fail, as the sequence name will NOT be truncated and the
> setval() call will fail.

<yawn> ... AFAIR, we have heard no reports of such problems with the
32-to-64 name conversion.  The odds of problems with a (hypothetical)
future increase to 128 would be far less.

I think we have more immediate problems to worry about.
        regards, tom lane



Re: pg_dump future problem.

From
Tom Lane
Date:
Philip Warner <pjw@rhyme.com.au> writes:
> My preference for a solution to this problem would be to remove even more 
> implementation knowledge from pg_dump, and add a command like:
>      alter table really_long_name set next a = <next sequence value>;

And the is_called flag fits into this where?
        regards, tom lane



Re: pg_dump future problem.

From
Philip Warner
Date:
At 10:10 AM 4/05/2003 -0400, Tom Lane wrote:
>And the is_called flag fits into this where?

My recollection is that is_called is used to cover the boundary case where 
the 'current' value should not be incremented before being returned, but my 
memory is hazy. If this is the case, can we not just set it to true and set 
the value to (next - 1) unless (next - 1) < minv, in which case we set it 
to minv and set is_called to false?

Note that for the purpose of serial values we do not need to set the 
sequence exactly as it was internally, we just need to make sure that the 
next allocated value will be what we expect.


----------------------------------------------------------------
Philip Warner                    |     __---_____
Albatross Consulting Pty. Ltd.   |----/       -  \
(A.B.N. 75 008 659 498)          |          /(@)   ______---_
Tel: (+61) 0500 83 82 81         |                 _________  \
Fax: (+61) 03 5330 3172          |                 ___________ |
Http://www.rhyme.com.au          |                /           \|                                 |    --________--
PGP key available upon request,  |  /
and from pgp5.ai.mit.edu:11371   |/



Re: pg_dump future problem.

From
Tom Lane
Date:
Philip Warner <pjw@rhyme.com.au> writes:
> At 10:10 AM 4/05/2003 -0400, Tom Lane wrote:
>> And the is_called flag fits into this where?

> My recollection is that is_called is used to cover the boundary case where 
> the 'current' value should not be incremented before being returned, but my 
> memory is hazy. If this is the case, can we not just set it to true and set 
> the value to (next - 1) unless (next - 1) < minv, in which case we set it 
> to minv and set is_called to false?

This would fail to cover the case where the user has used setval() to
set is_called false and last_value to something other than minv.

> Note that for the purpose of serial values we do not need to set the 
> sequence exactly as it was internally, we just need to make sure that the 
> next allocated value will be what we expect.

I disagree with the notion that pg_dump need not dump legal
configurations of the database.

ISTM the real issue here is that we don't want pg_dump dumps to
hard-wire knowledge of the algorithm for deriving a sequence name from a
table and column name.  Why not attack that straightforwardly: provide a
backend function that computes one from the other?  Then the SETVAL
calls could become something like
SELECT setval(serial_seq_name('table', 'column'), 42, true);

and we're not sacrificing anything --- nor going through the very
substantial overhead of creating a new ALTER TABLE variant.
        regards, tom lane



Re: pg_dump future problem.

From
Philip Warner
Date:
At 09:45 AM 5/05/2003 -0400, Tom Lane wrote:

>This would fail to cover the case where the user has used setval() to
>set is_called false and last_value to something other than minv.

In this case I think they have shot themselves in the foot; the docs 
clearly state that setval/3 is for internal pg_dump use only. It is also 
not to be relied upon when there are more than one connection to the db 
updating the sequence.


> > Note that for the purpose of serial values we do not need to set the
> > sequence exactly as it was internally, we just need to make sure that the
> > next allocated value will be what we expect.
>
>I disagree with the notion that pg_dump need not dump legal
>configurations of the database.

What I am suggesting is that we make SERIAL sequences more black-box like 
so that we have less reliance on specific implementation details in other 
pieces of code (pg_dump).


>         SELECT setval(serial_seq_name('table', 'column'), 42, true);
>
>and we're not sacrificing anything

I think we are: setval/3 was written to fix one mess with sequences 
(inability to set is_called), now we are suggesting another function to 
patch another (small) hole. ISTM that a more implementation-independant 
approach might be needed. I am not attached to my solution, but I do think 
it's a good idea to look at what would have been done with a 'green fields' 
design, and then ask: can we do it now? Is it worth it?

AFAICT, is_called is just a kludge to allow sequences to attain minval at 
some point; I'm not sure we should be supporting people actually setting it.


>--- nor going through the very
>substantial overhead of creating a new ALTER TABLE variant.

Admittedly it's more complex, but which makes PostgreSQL a better product?
    ALTER TABLE sometable ALTER COLUMN somecolumn SET NEXT 42;
or    SELECT setval(serial_seq_name('sometable', 'somecolumn'), 42, true);





----------------------------------------------------------------
Philip Warner                    |     __---_____
Albatross Consulting Pty. Ltd.   |----/       -  \
(A.B.N. 75 008 659 498)          |          /(@)   ______---_
Tel: (+61) 0500 83 82 81         |                 _________  \
Fax: (+61) 03 5330 3172          |                 ___________ |
Http://www.rhyme.com.au          |                /           \|                                |    --________--
PGP key available upon request,  |  /
and from pgp5.ai.mit.edu:11371   |/



Re: pg_dump future problem.

From
Tom Lane
Date:
Philip Warner <pjw@rhyme.com.au> writes:
> At 09:45 AM 5/05/2003 -0400, Tom Lane wrote:
>> This would fail to cover the case where the user has used setval() to
>> set is_called false and last_value to something other than minv.

> In this case I think they have shot themselves in the foot; the docs 
> clearly state that setval/3 is for internal pg_dump use only.

There is no such statement visible in
http://www.ca.postgresql.org/users-lounge/docs/7.3/postgres/functions-sequence.html
nor do I find it anywhere else in the current documents.

> It is also 
> not to be relied upon when there are more than one connection to the db 
> updating the sequence.

Any more or less so than either two-parameter SETVAL or the proposed
ALTER TABLE?  I don't see how.

> I am not attached to my solution, but I do think 
> it's a good idea to look at what would have been done with a 'green fields' 
> design, and then ask: can we do it now? Is it worth it?

It probably would look different if we were starting from scratch
... but we aren't, and I don't see any problems here that are large
enough to justify starting over.
        regards, tom lane



Re: pg_dump future problem.

From
Philip Warner
Date:
At 10:58 AM 5/05/2003 -0400, Tom Lane wrote:
>Philip Warner <pjw@rhyme.com.au> writes:
>
> > In this case I think they have shot themselves in the foot; the docs
> > clearly state that setval/3 is for internal pg_dump use only.
>
>There is no such statement visible in
>http://www.ca.postgresql.org/users-lounge/docs/7.3/postgres/functions-sequence.html
>nor do I find it anywhere else in the current documents.

Good point. It's only in the source code. I thought I had updated the docs 
as well...


> > It is also
> > not to be relied upon when there are more than one connection to the db
> > updating the sequence.
>
>Any more or less so than either two-parameter SETVAL or the proposed
>ALTER TABLE?  I don't see how.

My recollection is that setting is_called is more fragile than just setting 
the sequence value, so it not wise to use in general.


>It probably would look different if we were starting from scratch
>... but we aren't, and I don't see any problems here that are large
>enough to justify starting over.

I'm not actually suggesting starting over. Just presenting a nicer 
interface and fixing a bug in the process, rather than building yet another 
user-visible function as a band-aid solution.



----------------------------------------------------------------
Philip Warner                    |     __---_____
Albatross Consulting Pty. Ltd.   |----/       -  \
(A.B.N. 75 008 659 498)          |          /(@)   ______---_
Tel: (+61) 0500 83 82 81         |                 _________  \
Fax: (+61) 03 5330 3172          |                 ___________ |
Http://www.rhyme.com.au          |                /           \|                                 |    --________--
PGP key available upon request,  |  /
and from pgp5.ai.mit.edu:11371   |/



Re: pg_dump future problem.

From
Tom Lane
Date:
Philip Warner <pjw@rhyme.com.au> writes:
> Good point. It's only in the source code. I thought I had updated the docs 
> as well...

Actually, I think the sequence of events was that we neglected to remove
the statement in the source code when we fixed the documentation.
3-parameter setval is documented because it solves a problem that users
have --- pg_dump is not the only application that needs to do this.

> My recollection is that setting is_called is more fragile than just setting 
> the sequence value, so it not wise to use in general.

Doesn't look that way to me; we're setting several fields of the
sequence record no matter what.  Perhaps your recollection predates
Vadim's last rewrite of the sequence code?

> I'm not actually suggesting starting over. Just presenting a nicer 
> interface and fixing a bug in the process, rather than building yet another 
> user-visible function as a band-aid solution.

But the proposed "nicer interface" *introduces* a bug, namely the
inability to preserve is_called, which is exactly the pg_dump bug
that 3-parameter setval was invented to fix.  I do not want to go
backwards in the name of a "nicer interface".
        regards, tom lane



Re: pg_dump future problem.

From
Philip Warner
Date:
At 11:22 AM 5/05/2003 -0400, Tom Lane wrote:
>But the proposed "nicer interface" *introduces* a bug, namely the
>inability to preserve is_called, which is exactly the pg_dump bug
>that 3-parameter setval was invented to fix.

No, I don't want to remove the setval functions from general sequences. I 
just want to use existing restrictions on the SERIAL data type: by 
definition they start at 0, increase by 1, and are intX. We can ignore the 
current is_called flag and set the min value or is_called flag in such a 
way as to get the desired next value.


----------------------------------------------------------------
Philip Warner                    |     __---_____
Albatross Consulting Pty. Ltd.   |----/       -  \
(A.B.N. 75 008 659 498)          |          /(@)   ______---_
Tel: (+61) 0500 83 82 81         |                 _________  \
Fax: (+61) 03 5330 3172          |                 ___________ |
Http://www.rhyme.com.au          |                /           \|                                 |    --________--
PGP key available upon request,  |  /
and from pgp5.ai.mit.edu:11371   |/