Thread: [HACKERS] different column orders in regression test database

[HACKERS] different column orders in regression test database

From

Peter Eisentraut

Date:

18 May 2017, 22:21:44

When you dump out the regression test database and load it back in, a
few tables end up with different column orders:

Original:
                Table "public.f_star"Column |     Type     | Collation | Nullable | Default
--------+--------------+-----------+----------+---------class  | character(1) |           |          |aa     | integer
   |           |          |cc     | name         |           |          |ee     | smallint     |           |
|ff    | polygon      |           |          |f      | integer      |           |          |e      | integer      |
     |          |a      | text         |           |          |
 

Reloaded:
                Table "public.f_star"Column |     Type     | Collation | Nullable | Default
--------+--------------+-----------+----------+---------class  | character(1) |           |          |aa     | integer
   |           |          |a      | text         |           |          |cc     | name         |           |
|ee    | smallint     |           |          |e      | integer      |           |          |ff     | polygon      |
     |          |f      | integer      |           |          |
 

This table is part of a lengthy inheritance chain, so this might be
intentional or too hard to fix.  This behavior goes back to 9.2 and
possibly further.

But this is a bit more suspicious:

Original:
            Table "public.mlparted11"Column |  Type   | Collation | Nullable | Default
--------+---------+-----------+----------+---------b      | integer |           | not null |a      | integer |
| not null |
 
Partition of: mlparted1 FOR VALUES FROM (2) TO (5)

Reloaded:
            Table "public.mlparted11"Column |  Type   | Collation | Nullable | Default
--------+---------+-----------+----------+---------a      | integer |           | not null |b      | integer |
| not null |
 
Partition of: mlparted1 FOR VALUES FROM (2) TO (5)

The same applies for other tables in this partitioning group:
public.mlparted12, public.mlparted2, public.mlparted4

But the root table public.mlparted matches on both sides.

While you can create all kinds of dubious messes with general
inheritance, this should probably not be allowed to happen in the
restricted setting of partitioning.

-- 
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: [HACKERS] different column orders in regression test database

From

Tom Lane

Date:

18 May 2017, 22:31:34

Peter Eisentraut <peter.eisentraut@2ndquadrant.com> writes:
> When you dump out the regression test database and load it back in, a
> few tables end up with different column orders:
> ...
> This table is part of a lengthy inheritance chain, so this might be
> intentional or too hard to fix.  This behavior goes back to 9.2 and
> possibly further.

Yeah, the variation in f_star and friends is intentional (and very
very old).

Can't say about the partition tests though.
        regards, tom lane

Re: [HACKERS] different column orders in regression test database

From

Thomas Munro

Date:

18 May 2017, 23:21:46

On Fri, May 19, 2017 at 7:21 AM, Peter Eisentraut
<peter.eisentraut@2ndquadrant.com> wrote:
> But this is a bit more suspicious:
>
> Original:
>
>              Table "public.mlparted11"
>  Column |  Type   | Collation | Nullable | Default
> --------+---------+-----------+----------+---------
>  b      | integer |           | not null |
>  a      | integer |           | not null |
> Partition of: mlparted1 FOR VALUES FROM (2) TO (5)
>
> Reloaded:
>
>              Table "public.mlparted11"
>  Column |  Type   | Collation | Nullable | Default
> --------+---------+-----------+----------+---------
>  a      | integer |           | not null |
>  b      | integer |           | not null |
> Partition of: mlparted1 FOR VALUES FROM (2) TO (5)
>
> The same applies for other tables in this partitioning group:
> public.mlparted12, public.mlparted2, public.mlparted4
>
> But the root table public.mlparted matches on both sides.
>
> While you can create all kinds of dubious messes with general
> inheritance, this should probably not be allowed to happen in the
> restricted setting of partitioning.

That's because if you attach a partition with a different column
ordering, pg_dump dumps it with a normal CREATE TABLE ... PARTITION OF
... command, so the ordering it lost.

Example:

create table p (a int, b int) partition by list (a);
create table c (b int, a int);
alter table p attach partition c for values in (42);

Then "c" is dumped as:

CREATE TABLE c PARTITION OF p
FOR VALUES IN (42);

If you wanted to preserve column orders for partitions I guess you'd
have to teach to to detect the difference (ignoring dropped columns?)
and generate the two step create-and-attach commands.

-- 
Thomas Munro
http://www.enterprisedb.com

Re: [HACKERS] different column orders in regression test database

From

Peter Eisentraut

Date:

19 May 2017, 01:53:26

On 5/18/17 16:21, Thomas Munro wrote:
> That's because if you attach a partition with a different column
> ordering,

Is it intentional and sensible to allow that in the first place?  Or was
it just inherited from inheritance?

> pg_dump dumps it with a normal CREATE TABLE ... PARTITION OF
> ... command, so the ordering it lost.

So it appears that either the above should be prohibited or pg_dump
should be fixed.

-- 
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: [HACKERS] different column orders in regression test database

From

Thomas Munro

Date:

19 May 2017, 02:07:59

On Fri, May 19, 2017 at 10:53 AM, Peter Eisentraut
<peter.eisentraut@2ndquadrant.com> wrote:
> On 5/18/17 16:21, Thomas Munro wrote:
>> That's because if you attach a partition with a different column
>> ordering,
>
> Is it intentional and sensible to allow that in the first place?  Or was
> it just inherited from inheritance?

Can't speak for the authors but I'm sure it's intentional.  Making an
existing table fit into a partitioning hierarchy is a useful thing to
be able to do, and you can't reorder columns.

>> pg_dump dumps it with a normal CREATE TABLE ... PARTITION OF
>> ... command, so the ordering it lost.
>
> So it appears that either the above should be prohibited or pg_dump
> should be fixed.

pg_dump already knows how to do create-then-attach for binary
upgrades, for a less debatable reason: tuple format must be preserved.
To make normal dump/restore preserve the order, we could either make
it *always* write create-then-attach, or do it only if required.  I'd
vote for doing it only if required because of different column order,
because I don't want to see 1,000 partitions dumped in "long format"
when the short and sweet CREATE... PARTITION OF ... syntax could
usually be used.

-- 
Thomas Munro
http://www.enterprisedb.com

Re: [HACKERS] different column orders in regression test database

From

Alvaro Herrera

Date:

19 May 2017, 02:11:15

Peter Eisentraut wrote:
> On 5/18/17 16:21, Thomas Munro wrote:
> > That's because if you attach a partition with a different column
> > ordering,
> 
> Is it intentional and sensible to allow that in the first place?  Or was
> it just inherited from inheritance?

I think it was deliberately allowed.  Note that if you have a table with
dropped columns which you want to make a partition of another table
without them, there will need to be some physical transformation of
the tuples anyway in order for reading to work; we certainly don't want
to reject such cases.

> > pg_dump dumps it with a normal CREATE TABLE ... PARTITION OF
> > ... command, so the ordering it lost.
> 
> So it appears that either the above should be prohibited or pg_dump
> should be fixed.

Are you proposing that if the ordering of the columns of a partition is
not identical to that of its parent table, the partition should be
dumped as a regular CREATE TABLE followed by ALTER TABLE .. ATTACH PARTITION,
instead of a single CREATE TABLE .. PARTITION OF command?

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: [HACKERS] different column orders in regression test database

From

Peter Eisentraut

Date:

19 May 2017, 05:02:45

On 5/18/17 19:07, Thomas Munro wrote:
> To make normal dump/restore preserve the order, we could either make
> it *always* write create-then-attach, or do it only if required.  I'd
> vote for doing it only if required because of different column order,
> because I don't want to see 1,000 partitions dumped in "long format"
> when the short and sweet CREATE... PARTITION OF ... syntax could
> usually be used.

Doing it the long way only when necessary makes sense.  Maybe never
doing it the long way also makes sense, as long as we're clear that
that's what we want.

-- 
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: [HACKERS] different column orders in regression test database

From

Amit Langote

Date:

19 May 2017, 07:40:49

On 2017/05/19 11:02, Peter Eisentraut wrote:
> On 5/18/17 19:07, Thomas Munro wrote:
>> To make normal dump/restore preserve the order, we could either make
>> it *always* write create-then-attach, or do it only if required.  I'd
>> vote for doing it only if required because of different column order,
>> because I don't want to see 1,000 partitions dumped in "long format"
>> when the short and sweet CREATE... PARTITION OF ... syntax could
>> usually be used.
> 
> Doing it the long way only when necessary makes sense.  Maybe never
> doing it the long way also makes sense, as long as we're clear that
> that's what we want.

I tend to prefer the latter - never doing it the long way (which is what
happens today [1]).  It's always better for all the partitions to have the
same tuple descriptor as the parent in the target database, which is what
the short CREATE TABLE .. PARTITION OF syntax would result in.  The long
format is unavoidable in the case of --binary-upgrade dump mode for
obvious reasons.

Thanks,
Amit

[1]

create table p (a int, b char) partition by list (a);
create table p1 (c int, b char, a int);
alter table p1 drop c;
alter table p attach partition p1 for values in (1);
insert into p values (1, 'a');
select tableoid::regclass, * from p;tableoid | a | b
----------+---+---p1       | 1 | a
(1 row)

$ pg_dump

CREATE TABLE p (   a integer,   b character(1)
)
PARTITION BY LIST (a);

CREATE TABLE p1 PARTITION OF p
FOR VALUES IN (1);

COPY p1 (b, a) FROM stdin;
a    1
\.