Thread: logical column position

logical column position

From
Neil Conway
Date:
I'd like to add a new column to pg_attribute that specifies the
attribute's "logical position" within its relation. The idea here is
to separate the logical order of the columns in a relation from the
on-disk storage of the relation's tuples. This allows us to easily &
quickly change column order, add an additional column before or after
an existing column, etc.

At present, attnum basically does three things: identifies an column
within a relation, indicates which columns are system columns, and
defines the order of a relation's columns. I'd like to move this last
functionality into a separate pg_attribute column named "attpos" (or
"attlogicalpos"):
        - when the table is created, attnum == attpos. System columns          have attpos < 0, as with attnum. At no
pointwill two          columns of the same relation have the same attpos.
 
        - when returning output to the client and no column ordering          is implied by the query (e.g. "SELECT *
..."),we sort the          columns in ascending attpos order.
 
        - when storing a tuple on disk, we don't consider attpos
        - if we want to change the order of the column's in a          relation, we can do so merely by updating
pg_attribute;no          changes to the on-disk storage of the relation should be          necessary
 

A few notes:
 (a) ISTM this should also apply to COPY TO and COPY FROM if the user     didn't supply a column list. Is this
reasonable?It would break     dumps of the table's contents, but then again, dumps aren't     guaranteed to remain
validover arbitrary changes to the table's     meta-data.
 
 (b) Using the above scheme that attnum == attpos initially, there     won't be any gaps in the sequence of attpos
values.That means     that if, for example, we want to move the column in position 50     to position 1, we'll need to
changethe position's of all the     columns in positions [1..49] (and suffer the resulting MVCC     bloat in
pg_attribute).Changing the column order is hardly a     performance critical operation, so that might be acceptable.
 
     If we want to avoid this, one easy (but arguably unclean) way to     do so would be to make the initial value of
attpos== attnum *     1000, and make attpos an int4 rather than an int2. Then, we can     do most column reordering
operationswith only a single     pg_attribute update -- in the worst-case that enough     re-orderings are done that we
overflowthe 999 "padding"     positions, we can just fall-back to doing multiple pg_attribute     updates. Is this
worthdoing, and/or is there a better way to     achieve the same effect?
 
 (c) Do I need to consider inheritance?

Comments are welcome.

-Neil



Re: logical column position

From
Bruce Momjian
Date:
Neil Conway wrote:
> I'd like to add a new column to pg_attribute that specifies the
> attribute's "logical position" within its relation. The idea here is
> to separate the logical order of the columns in a relation from the
> on-disk storage of the relation's tuples. This allows us to easily &
> quickly change column order, add an additional column before or after
> an existing column, etc.
> 
> At present, attnum basically does three things: identifies an column
> within a relation, indicates which columns are system columns, and
> defines the order of a relation's columns. I'd like to move this last
> functionality into a separate pg_attribute column named "attpos" (or
> "attlogicalpos"):
> 
>          - when the table is created, attnum == attpos. System columns
>            have attpos < 0, as with attnum. At no point will two
>            columns of the same relation have the same attpos.
> 
>          - when returning output to the client and no column ordering
>            is implied by the query (e.g. "SELECT * ..."), we sort the
>            columns in ascending attpos order.

Seems the only cases where attpos would be used would be SELECT *,
INSERT with no column list, and COPY --- seems like a nifty feature.

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
359-1001+  If your life is a hard drive,     |  13 Roberts Road +  Christ can be your backup.        |  Newtown Square,
Pennsylvania19073
 


Re: logical column position

From
Tom Lane
Date:
Neil Conway <neilc@samurai.com> writes:
> At present, attnum basically does three things: identifies an column
> within a relation, indicates which columns are system columns, and
> defines the order of a relation's columns. I'd like to move this last
> functionality into a separate pg_attribute column named "attpos" (or
> "attlogicalpos"):

"attpos" is a horrid choice of name, because no one will be able to
remember which of "attnum" and "attpos" is which.  Pick a more distinct
name.  Offhand the best thing I can think of is "attlognum" or "attlogpos".

>          - when the table is created, attnum == attpos. System columns
>            have attpos < 0, as with attnum. At no point will two
>            columns of the same relation have the same attpos.

What are you going to do with deleted columns?  I'd be inclined to give
them all attlogpos = 0, but that destroys your last comment.

>   (a) ISTM this should also apply to COPY TO and COPY FROM if the user
>       didn't supply a column list. Is this reasonable?

Yes, also INSERT INTO, also the implicit ordering of output columns of a
JOIN, also the matching of aliases to columns in a FROM-list alias,
probably one or two other places.  SQL exposes column ordering in more
places than just "SELECT *".

>       If we want to avoid this, one easy (but arguably unclean) way to
>       do so would be to make the initial value of attpos == attnum *
>       1000, and make attpos an int4 rather than an int2. Then, we can
>       do most column reordering operations with only a single
>       pg_attribute update -- in the worst-case that enough
>       re-orderings are done that we overflow the 999 "padding"
>       positions, we can just fall-back to doing multiple pg_attribute
>       updates. Is this worth doing, and/or is there a better way to
>       achieve the same effect?

That seems horribly messy.  Just renumber.

>   (c) Do I need to consider inheritance?

Yes.  I think it'd be good if things were constrained so that columns
1..n in a parent table always matched columns 1..n in every child,
which is not true now after adding/dropping columns.  That would make it
easier/cheaper/more reliable to match up which child columns are to be
referenced in an inherited query (see adjust_inherited_attrs).  I think
the effective constraints would have to be about the same as what we now
impose on column names in an inheritance hierarchy.

You have not presented any proposal for exactly what ALTER TABLE
operations would be offered to manipulate the column positions.
My recollection is that some consensus was reached on that point
in the last thread we had on this issue --- have you consulted the
archives?
        regards, tom lane


Re: logical column position

From
Alvaro Herrera Munoz
Date:
On Thu, Nov 20, 2003 at 10:39:24AM -0500, Tom Lane wrote:

> >   (c) Do I need to consider inheritance?
> 
> Yes.  I think it'd be good if things were constrained so that columns
> 1..n in a parent table always matched columns 1..n in every child,
> which is not true now after adding/dropping columns.  That would make it
> easier/cheaper/more reliable to match up which child columns are to be
> referenced in an inherited query (see adjust_inherited_attrs).

No way, because of multiple inheritance.  Each child should have an
attparentnum, which would point to the parent's attnum for this to work ...

-- 
Alvaro Herrera (<alvherre[@]dcc.uchile.cl>)
"Aprender sin pensar es in�til; pensar sin aprender, peligroso" (Confucio)


Re: logical column position

From
Tom Lane
Date:
Alvaro Herrera Munoz <alvherre@dcc.uchile.cl> writes:
> On Thu, Nov 20, 2003 at 10:39:24AM -0500, Tom Lane wrote:
> (c) Do I need to consider inheritance?
>> 
>> Yes.  I think it'd be good if things were constrained so that columns
>> 1..n in a parent table always matched columns 1..n in every child,
>> which is not true now after adding/dropping columns.

> No way, because of multiple inheritance.  Each child should have an
> attparentnum, which would point to the parent's attnum for this to work ...

Hm, good point.  And I think we merge identically-named columns
inherited from different parents, which would mean that "attparentnum"
wouldn't have a unique value anyway.

Perhaps rearranging a parent's columns shouldn't have *any* direct
effect on a child?  Seems ugly though.
        regards, tom lane


Re: logical column position

From
Neil Conway
Date:
Tom Lane <tgl@sss.pgh.pa.us> writes:
> "attpos" is a horrid choice of name, because no one will be able to
> remember which of "attnum" and "attpos" is which.  Pick a more
> distinct name.  Offhand the best thing I can think of is "attlognum"
> or "attlogpos".

Actually, I deliberately chose attpos rather than attlognum (which is
what some people had been calling this feature earlier). My reasoning
was that the "logical number" is really a nonsensical idea: we just
invented it on the spot. In contrast, a "position" is a fairly natural
thing for an attribute to have -- it's a notion with some counterpart
in the real world. To me, at least, it seems intuitive that an
"attnum" would identify a column whereas an "attpos" would specify the
column's position.

I'm happy to change the name if there's a consensus that attpos isn't
a good choice -- what does everyone think?

> What are you going to do with deleted columns?  I'd be inclined to
> give them all attlogpos = 0, but that destroys your last comment.

I hadn't planned to do anything in particular for deleted columns:
since they are never displayed to the user, does it matter what their
attpos is?

In any event, the property that no two columns in a table have the
same logical number isn't important anyway.

> You have not presented any proposal for exactly what ALTER TABLE
> operations would be offered to manipulate the column positions.

I'd like to get the backend storage side of things implemented
first. I'll take a look at the archives before I do any UI work --
thanks for the suggestion.

-Neil



Re: logical column position

From
Tom Lane
Date:
Neil Conway <neilc@samurai.com> writes:
> Actually, I deliberately chose attpos rather than attlognum (which is
> what some people had been calling this feature earlier). My reasoning
> was that the "logical number" is really a nonsensical idea: we just
> invented it on the spot.

True ...

> In contrast, a "position" is a fairly natural
> thing for an attribute to have -- it's a notion with some counterpart
> in the real world.

But "position" could at least as logically be considered to mean the
physical position in the tuple.  I still say that these names are ripe
for confusion.

I don't have a better choice of name offhand, but if we spend 1% of the
time already spent arguing about these issues on finding a better name,
I'm sure we can think of one ;-)
        regards, tom lane


Re: logical column position

From
Rod Taylor
Date:
> I don't have a better choice of name offhand, but if we spend 1% of the
> time already spent arguing about these issues on finding a better name,
> I'm sure we can think of one ;-)

virtual (attvirtnum)
external (attextnum)

atttisoywnum -> attribute this is the one you want number



Re: logical column position

From
Robert Treat
Date:
On Thu, 2003-11-20 at 23:27, Tom Lane wrote:
> Neil Conway <neilc@samurai.com> writes:
> > Actually, I deliberately chose attpos rather than attlognum (which is
> > what some people had been calling this feature earlier). My reasoning
> > was that the "logical number" is really a nonsensical idea: we just
> > invented it on the spot.
> 
> True ...
> 
> > In contrast, a "position" is a fairly natural
> > thing for an attribute to have -- it's a notion with some counterpart
> > in the real world.
> 
> But "position" could at least as logically be considered to mean the
> physical position in the tuple.  I still say that these names are ripe
> for confusion.
> 
> I don't have a better choice of name offhand, but if we spend 1% of the
> time already spent arguing about these issues on finding a better name,
> I'm sure we can think of one ;-)
> 

Seems merging the two would work... attlogpos, the attributes logical
position.

Robert Treat
-- 
Build A Brighter Lamp :: Linux Apache {middleware} PostgreSQL



Re: logical column position

From
Neil Conway
Date:
Robert Treat <xzilla@users.sourceforge.net> writes:
> Seems merging the two would work... attlogpos, the attributes
> logical position.

Unless anyone has any further objections, I'll switch to using attlogpos.

-Neil