Thread: [HACKERS] different column orders in regression test database
When you dump out the regression test database and load it back in, a few tables end up with different column orders: Original: Table "public.f_star"Column | Type | Collation | Nullable | Default --------+--------------+-----------+----------+---------class | character(1) | | |aa | integer | | |cc | name | | |ee | smallint | | |ff | polygon | | |f | integer | | |e | integer | | |a | text | | | Reloaded: Table "public.f_star"Column | Type | Collation | Nullable | Default --------+--------------+-----------+----------+---------class | character(1) | | |aa | integer | | |a | text | | |cc | name | | |ee | smallint | | |e | integer | | |ff | polygon | | |f | integer | | | This table is part of a lengthy inheritance chain, so this might be intentional or too hard to fix. This behavior goes back to 9.2 and possibly further. But this is a bit more suspicious: Original: Table "public.mlparted11"Column | Type | Collation | Nullable | Default --------+---------+-----------+----------+---------b | integer | | not null |a | integer | | not null | Partition of: mlparted1 FOR VALUES FROM (2) TO (5) Reloaded: Table "public.mlparted11"Column | Type | Collation | Nullable | Default --------+---------+-----------+----------+---------a | integer | | not null |b | integer | | not null | Partition of: mlparted1 FOR VALUES FROM (2) TO (5) The same applies for other tables in this partitioning group: public.mlparted12, public.mlparted2, public.mlparted4 But the root table public.mlparted matches on both sides. While you can create all kinds of dubious messes with general inheritance, this should probably not be allowed to happen in the restricted setting of partitioning. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Peter Eisentraut <peter.eisentraut@2ndquadrant.com> writes: > When you dump out the regression test database and load it back in, a > few tables end up with different column orders: > ... > This table is part of a lengthy inheritance chain, so this might be > intentional or too hard to fix. This behavior goes back to 9.2 and > possibly further. Yeah, the variation in f_star and friends is intentional (and very very old). Can't say about the partition tests though. regards, tom lane
On Fri, May 19, 2017 at 7:21 AM, Peter Eisentraut <peter.eisentraut@2ndquadrant.com> wrote: > But this is a bit more suspicious: > > Original: > > Table "public.mlparted11" > Column | Type | Collation | Nullable | Default > --------+---------+-----------+----------+--------- > b | integer | | not null | > a | integer | | not null | > Partition of: mlparted1 FOR VALUES FROM (2) TO (5) > > Reloaded: > > Table "public.mlparted11" > Column | Type | Collation | Nullable | Default > --------+---------+-----------+----------+--------- > a | integer | | not null | > b | integer | | not null | > Partition of: mlparted1 FOR VALUES FROM (2) TO (5) > > The same applies for other tables in this partitioning group: > public.mlparted12, public.mlparted2, public.mlparted4 > > But the root table public.mlparted matches on both sides. > > While you can create all kinds of dubious messes with general > inheritance, this should probably not be allowed to happen in the > restricted setting of partitioning. That's because if you attach a partition with a different column ordering, pg_dump dumps it with a normal CREATE TABLE ... PARTITION OF ... command, so the ordering it lost. Example: create table p (a int, b int) partition by list (a); create table c (b int, a int); alter table p attach partition c for values in (42); Then "c" is dumped as: CREATE TABLE c PARTITION OF p FOR VALUES IN (42); If you wanted to preserve column orders for partitions I guess you'd have to teach to to detect the difference (ignoring dropped columns?) and generate the two step create-and-attach commands. -- Thomas Munro http://www.enterprisedb.com
On 5/18/17 16:21, Thomas Munro wrote: > That's because if you attach a partition with a different column > ordering, Is it intentional and sensible to allow that in the first place? Or was it just inherited from inheritance? > pg_dump dumps it with a normal CREATE TABLE ... PARTITION OF > ... command, so the ordering it lost. So it appears that either the above should be prohibited or pg_dump should be fixed. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Fri, May 19, 2017 at 10:53 AM, Peter Eisentraut <peter.eisentraut@2ndquadrant.com> wrote: > On 5/18/17 16:21, Thomas Munro wrote: >> That's because if you attach a partition with a different column >> ordering, > > Is it intentional and sensible to allow that in the first place? Or was > it just inherited from inheritance? Can't speak for the authors but I'm sure it's intentional. Making an existing table fit into a partitioning hierarchy is a useful thing to be able to do, and you can't reorder columns. >> pg_dump dumps it with a normal CREATE TABLE ... PARTITION OF >> ... command, so the ordering it lost. > > So it appears that either the above should be prohibited or pg_dump > should be fixed. pg_dump already knows how to do create-then-attach for binary upgrades, for a less debatable reason: tuple format must be preserved. To make normal dump/restore preserve the order, we could either make it *always* write create-then-attach, or do it only if required. I'd vote for doing it only if required because of different column order, because I don't want to see 1,000 partitions dumped in "long format" when the short and sweet CREATE... PARTITION OF ... syntax could usually be used. -- Thomas Munro http://www.enterprisedb.com
Peter Eisentraut wrote: > On 5/18/17 16:21, Thomas Munro wrote: > > That's because if you attach a partition with a different column > > ordering, > > Is it intentional and sensible to allow that in the first place? Or was > it just inherited from inheritance? I think it was deliberately allowed. Note that if you have a table with dropped columns which you want to make a partition of another table without them, there will need to be some physical transformation of the tuples anyway in order for reading to work; we certainly don't want to reject such cases. > > pg_dump dumps it with a normal CREATE TABLE ... PARTITION OF > > ... command, so the ordering it lost. > > So it appears that either the above should be prohibited or pg_dump > should be fixed. Are you proposing that if the ordering of the columns of a partition is not identical to that of its parent table, the partition should be dumped as a regular CREATE TABLE followed by ALTER TABLE .. ATTACH PARTITION, instead of a single CREATE TABLE .. PARTITION OF command? -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On 5/18/17 19:07, Thomas Munro wrote: > To make normal dump/restore preserve the order, we could either make > it *always* write create-then-attach, or do it only if required. I'd > vote for doing it only if required because of different column order, > because I don't want to see 1,000 partitions dumped in "long format" > when the short and sweet CREATE... PARTITION OF ... syntax could > usually be used. Doing it the long way only when necessary makes sense. Maybe never doing it the long way also makes sense, as long as we're clear that that's what we want. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On 2017/05/19 11:02, Peter Eisentraut wrote: > On 5/18/17 19:07, Thomas Munro wrote: >> To make normal dump/restore preserve the order, we could either make >> it *always* write create-then-attach, or do it only if required. I'd >> vote for doing it only if required because of different column order, >> because I don't want to see 1,000 partitions dumped in "long format" >> when the short and sweet CREATE... PARTITION OF ... syntax could >> usually be used. > > Doing it the long way only when necessary makes sense. Maybe never > doing it the long way also makes sense, as long as we're clear that > that's what we want. I tend to prefer the latter - never doing it the long way (which is what happens today [1]). It's always better for all the partitions to have the same tuple descriptor as the parent in the target database, which is what the short CREATE TABLE .. PARTITION OF syntax would result in. The long format is unavoidable in the case of --binary-upgrade dump mode for obvious reasons. Thanks, Amit [1] create table p (a int, b char) partition by list (a); create table p1 (c int, b char, a int); alter table p1 drop c; alter table p attach partition p1 for values in (1); insert into p values (1, 'a'); select tableoid::regclass, * from p;tableoid | a | b ----------+---+---p1 | 1 | a (1 row) $ pg_dump CREATE TABLE p ( a integer, b character(1) ) PARTITION BY LIST (a); CREATE TABLE p1 PARTITION OF p FOR VALUES IN (1); COPY p1 (b, a) FROM stdin; a 1 \.