Thread: 9.2beta1 regression: pg_restore --data-only does not set sequence values any more

Hello all,

while packaging 9.2 beta 1 for Debian/Ubuntu the postgresql-common
test suite noticed a regression: It seems that pg_restore --data-only
now skips the current value of sequences, so that in the upgraded
database the sequence counter is back to the default. This can lead to
rather serious data errors.

Reproducer:=20

 * Create a db and sequence:

  $ createdb test
  $ psql test -c "CREATE SEQUENCE odd10 INCREMENT BY 2 MINVALUE 1 MAXVALUE =
10 CYCLE"
  CREATE SEQUENCE
=20
 * Advance it two steps:

   $ psql --cluster 9.2/main -Atc "SELECT nextval('odd10')" test
   1
   $ psql --cluster 9.2/main -Atc "SELECT nextval('odd10')" test
   3

 * Dump schema and "test" data:

   $ pg_dumpall -s > /tmp/schema
   $ pg_dump -Fc test > /tmp/test.dump

 * Drop db:
=20
   $ dropdb test

 * Restore DB:

   $ psql template1 < /tmp/schema=20
   $ pg_restore --data-only -d postgres /tmp/test.dump=20

 * Check the counter:

   $ psql -Atc "SELECT nextval('odd10')" test
   1

This is wrong, it should be "5", from the original database. This
worked all the way up to 9.1.


With "pg_restore /tmp/test.dump" (i. e. a full dump), you see

   SELECT pg_catalog.setval('odd10', 3, true);

in the dump, but it is not present in the output of
"pg_restore --data-only /tmp/test.dump". However, when I use the 9.1
version, it works:

   $ /usr/lib/postgresql/9.1/bin/pg_restore --data-only  /tmp/test.dump  | =
grep setval
   SELECT pg_catalog.setval('odd10', 3, true);

Thanks!

Martin

--=20
Martin Pitt                        | http://www.piware.de
Ubuntu Developer (www.ubuntu.com)  | Debian Developer  (www.debian.org)
Martin Pitt <mpitt@debian.org> writes:
> while packaging 9.2 beta 1 for Debian/Ubuntu the postgresql-common
> test suite noticed a regression: It seems that pg_restore --data-only
> now skips the current value of sequences, so that in the upgraded
> database the sequence counter is back to the default.

I believe this is a consequence of commit
a4cd6abcc901c1a8009c62a27f78696717bb8fe1, which introduced the entirely
false assumption that --schema-only and --data-only have something to
do with the order that entries appear in the archive ...

            regards, tom lane
On Wed, May 16, 2012 at 9:08 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

> Martin Pitt <mpitt@debian.org> writes:
> > while packaging 9.2 beta 1 for Debian/Ubuntu the postgresql-common
> > test suite noticed a regression: It seems that pg_restore --data-only
> > now skips the current value of sequences, so that in the upgraded
> > database the sequence counter is back to the default.
>
> I believe this is a consequence of commit
> a4cd6abcc901c1a8009c62a27f78696717bb8fe1, which introduced the entirely
> false assumption that --schema-only and --data-only have something to
> do with the order that entries appear in the archive ...
>
>
>

Darn, will investigate.

cheers

andrew

On 05/16/2012 10:23 AM, Andrew Dunstan wrote:
>
>
> On Wed, May 16, 2012 at 9:08 AM, Tom Lane <tgl@sss.pgh.pa.us 
> <mailto:tgl@sss.pgh.pa.us>> wrote:
>
>     Martin Pitt <mpitt@debian.org <mailto:mpitt@debian.org>> writes:
>     > while packaging 9.2 beta 1 for Debian/Ubuntu the postgresql-common
>     > test suite noticed a regression: It seems that pg_restore
>     --data-only
>     > now skips the current value of sequences, so that in the upgraded
>     > database the sequence counter is back to the default.
>
>     I believe this is a consequence of commit
>     a4cd6abcc901c1a8009c62a27f78696717bb8fe1, which introduced the
>     entirely
>     false assumption that --schema-only and --data-only have something to
>     do with the order that entries appear in the archive ...
>
>
>
> Darn, will investigate.
>
>

[cc -hackers]

Well, the trouble is that we have these pesky SECTION_NONE entries for 
things like comments, security labels and ACLs that need to be dumped in 
the right section, so we can't totally ignore the order. But we could 
(and probably should) ignore the order for making decisions about 
everything BUT those entries.

So, here's a revised plan:
    --section=data will dump exactly TABLE DATA, SEQUENCE SET or BLOBS 
entries    --section=pre-data will dump SECTION_PRE_DATA items (other than 
SEQUENCE SET) plus any immediately following SECTION_NONE items.    --section=post-data will dump everything else.

Comments?


cheers

andrew



On 05/21/2012 02:59 PM, Andrew Dunstan wrote:
>
>
> On 05/16/2012 10:23 AM, Andrew Dunstan wrote:
>>
>>
>> On Wed, May 16, 2012 at 9:08 AM, Tom Lane <tgl@sss.pgh.pa.us
>> <mailto:tgl@sss.pgh.pa.us>> wrote:
>>
>>     Martin Pitt <mpitt@debian.org <mailto:mpitt@debian.org>> writes:
>> > while packaging 9.2 beta 1 for Debian/Ubuntu the postgresql-common
>> > test suite noticed a regression: It seems that pg_restore
>>     --data-only
>> > now skips the current value of sequences, so that in the upgraded
>> > database the sequence counter is back to the default.
>>
>>     I believe this is a consequence of commit
>>     a4cd6abcc901c1a8009c62a27f78696717bb8fe1, which introduced the
>>     entirely
>>     false assumption that --schema-only and --data-only have
>> something to
>>     do with the order that entries appear in the archive ...
>>
>>
>>
>> Darn, will investigate.
>>
>>
>
> [cc -hackers]
>
> Well, the trouble is that we have these pesky SECTION_NONE entries for
> things like comments, security labels and ACLs that need to be dumped
> in the right section, so we can't totally ignore the order. But we
> could (and probably should) ignore the order for making decisions
> about everything BUT those entries.
>
> So, here's a revised plan:
>
>     --section=data will dump exactly TABLE DATA, SEQUENCE SET or BLOBS
> entries
>     --section=pre-data will dump SECTION_PRE_DATA items (other than
> SEQUENCE SET) plus any immediately following SECTION_NONE items.
>     --section=post-data will dump everything else.
>
>



It turns out there were some infelicities with pg_dump as well as with
pg_restore.

I think the attached patch does the right thing. I'll keep testing -
I'll be happier if other people bang on it too.

cheers

andrew

Attachment
Andrew Dunstan <andrew@dunslane.net> writes:
>>> Martin Pitt <mpitt@debian.org <mailto:mpitt@debian.org>> writes:
>>>> while packaging 9.2 beta 1 for Debian/Ubuntu the postgresql-common
>>>> test suite noticed a regression: It seems that pg_restore --data-only
>>>> now skips the current value of sequences, so that in the upgraded
>>>> database the sequence counter is back to the default.

> It turns out there were some infelicities with pg_dump as well as with 
> pg_restore.

> I think the attached patch does the right thing. I'll keep testing - 
> I'll be happier if other people bang on it too.

After looking this over, I think the original patch was just
fundamentally wrong and needs to be largely rewritten.  The basic
error was in saying that the existing options --schema-only and
--data-only were equivalent to particular cases of --section, which
they are not.  The proposed new patch does not make this better;
it just makes the logic even more spaghetti-ish.

I think what we need is to rip all that out and treat --section as being
a new option that's not tied to the old ones in any way, but is an
entirely orthogonal filter.  The right place to implement it (for either
pg_dump or pg_restore) is in the TOC-scanning loops in
pg_backup_archiver.c, which can track which section they are in fairly
easily (probably define it as being the current item's section unless
that's SECTION_NONE, in which case use the previous section value).

BTW, I'm thinking we could make that code simpler and faster if the
_tocEntryRequired logic were done only once in an initial pass, and then
we stored the teReqs result into a work field in the TocEntry struct
for use in later passes.

Andrew told me off-list that he would be unavailable due to travel for
awhile, so I will have a go at fixing this.
        regards, tom lane