Thread: logical column position

logical column position

From
Neil Conway
Date:
I'd like to add a new column to pg_attribute that specifies the
attribute's "logical position" within its relation. The idea here is
to separate the logical order of the columns in a relation from the
on-disk storage of the relation's tuples. This allows us to easily &
quickly change column order, add an additional column before or after
an existing column, etc.

At present, attnum basically does three things: identifies an column
within a relation, indicates which columns are system columns, and
defines the order of a relation's columns. I'd like to move this last
functionality into a separate pg_attribute column named "attpos" (or
"attlogicalpos"):
        - when the table is created, attnum == attpos. System columns          have attpos < 0, as with attnum. At no
pointwill two          columns of the same relation have the same attpos.
 
        - when returning output to the client and no column ordering          is implied by the query (e.g. "SELECT *
..."),we sort the          columns in ascending attpos order.
 
        - when storing a tuple on disk, we don't consider attpos
        - if we want to change the order of the column's in a          relation, we can do so merely by updating
pg_attribute;no          changes to the on-disk storage of the relation should be          necessary
 

A few notes:
 (a) ISTM this should also apply to COPY TO and COPY FROM if the user     didn't supply a column list. Is this
reasonable?It would break     dumps of the table's contents, but then again, dumps aren't     guaranteed to remain
validover arbitrary changes to the table's     meta-data.
 
 (b) Using the above scheme that attnum == attpos initially, there     won't be any gaps in the sequence of attpos
values.That means     that if, for example, we want to move the column in position 50     to position 1, we'll need to
changethe position's of all the     columns in positions [1..49] (and suffer the resulting MVCC     bloat in
pg_attribute).Changing the column order is hardly a     performance critical operation, so that might be acceptable.
 
     If we want to avoid this, one easy (but arguably unclean) way to     do so would be to make the initial value of
attpos== attnum *     1000, and make attpos an int4 rather than an int2. Then, we can     do most column reordering
operationswith only a single     pg_attribute update -- in the worst-case that enough     re-orderings are done that we
overflowthe 999 "padding"     positions, we can just fall-back to doing multiple pg_attribute     updates. Is this
worthdoing, and/or is there a better way to     achieve the same effect?
 
 (c) Do I need to consider inheritance?

Comments are welcome.

-Neil



Re: logical column position

From
Peter Eisentraut
Date:
Neil Conway writes:

>   (b) Using the above scheme that attnum == attpos initially, there
>       won't be any gaps in the sequence of attpos values. That means
>       that if, for example, we want to move the column in position 50
>       to position 1, we'll need to change the position's of all the
>       columns in positions [1..49] (and suffer the resulting MVCC
>       bloat in pg_attribute). Changing the column order is hardly a
>       performance critical operation, so that might be acceptable.

I don't think you can speak of "bloat" for pg_attribute.  But you can
speak of a problem when you want to do the old col = col + 1 in the
presence of a unique index.

>   (c) Do I need to consider inheritance?

Inheritance is based on column names, so initially no, but if there is a
command to alter the column order, then it should have an ONLY option.

-- 
Peter Eisentraut   peter_e@gmx.net



Re: logical column position

From
Jon Jensen
Date:
On Mon, 17 Nov 2003, Neil Conway wrote:

> I'd like to add a new column to pg_attribute that specifies the
> attribute's "logical position" within its relation. The idea here is
> to separate the logical order of the columns in a relation from the
> on-disk storage of the relation's tuples. This allows us to easily &
> quickly change column order, add an additional column before or after
> an existing column, etc.

That sounds excellent!

>   (a) ISTM this should also apply to COPY TO and COPY FROM if the user
>       didn't supply a column list. Is this reasonable? It would break
>       dumps of the table's contents, but then again, dumps aren't
>       guaranteed to remain valid over arbitrary changes to the table's
>       meta-data.

You're just saying it'd break old dumps, right? I'd assume COPY FROM would
use attpos ordering when writing out columns, or that every user-visible
interaction with the table pretends the columns are in attpos order. So
dumps would break no more or less than when adding or dropping a column
currently, right?

Jon


Re: logical column position

From
Neil Conway
Date:
Peter Eisentraut <peter_e@gmx.net> writes:
> I don't think you can speak of "bloat" for pg_attribute.  But you
> can speak of a problem when you want to do the old col = col + 1 in
> the presence of a unique index.

I'm sorry, but I'm not sure what either of these comments mean -- can
you elaborate?

-Neil



Re: logical column position

From
Neil Conway
Date:
Jon Jensen <jon@endpoint.com> writes:
> You're just saying it'd break old dumps, right? I'd assume COPY FROM
> would use attpos ordering when writing out columns, or that every
> user-visible interaction with the table pretends the columns are in
> attpos order. So dumps would break no more or less than when adding
> or dropping a column currently, right?

Right -- AFAICS, the only change in COPY compatibility would be if you
COPY TO'd a table and then changed the logical column order in some
fashion, you would no longer be able to restore the dump (unless you
specified a column list for the COPY FROM -- which, btw, pg_dump
does). I don't think it will be a problem, I just thought I'd mention
it.

-Neil



Re: logical column position

From
Christopher Kings-Lynne
Date:
> Right -- AFAICS, the only change in COPY compatibility would be if you
> COPY TO'd a table and then changed the logical column order in some
> fashion, you would no longer be able to restore the dump (unless you
> specified a column list for the COPY FROM -- which, btw, pg_dump
> does). I don't think it will be a problem, I just thought I'd mention
> it.

Well it's the same problem as if you'd dropped a column in the middle of 
the table.

BTW, one main consideration is that all the postgres admin apps will now 
need to support ORDER BY attlognum for 7.5+.

Chris




Re: logical column position

From
Bruce Momjian
Date:
Neil Conway wrote:
> Peter Eisentraut <peter_e@gmx.net> writes:
> > I don't think you can speak of "bloat" for pg_attribute.  But you
> > can speak of a problem when you want to do the old col = col + 1 in
> > the presence of a unique index.
> 
> I'm sorry, but I'm not sure what either of these comments mean -- can
> you elaborate?

Peter is pointing out a problem with our unique indexes that might
cause you a problem.  Suppose you have a unique index in attlognum:test=> create table xx( lognum int);CREATE
TABLEtest=>insert into xx values (1);INSERT 17145 1test=> insert into xx values (2);INSERT 17146 1test=> update xx set
lognum= lognum + 1;UPDATE 2test=> create unique index yy on xx (lognum);CREATE INDEXtest=> update xx set lognum =
lognum+ 1;ERROR:  duplicate key violates unique constraint "yy"
 

There is discussion to delay unique constraint failures until commit,
then recheck them to see if they are still valid, sort of like what we
do with deferred triggers.  This would fix the problem because on
commit, those values are unique, but aren't while the rows are updated
invidually.  If we don't get that working you might want to use the 1000
gap idea because it doesn't cause this problem, and we don't support
>1600 columns, so a 1000 gap shouldn't cause a problem and can be
modified later.  If they hit 999 updates, just tell them to dump/reload
the table.

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
359-1001+  If your life is a hard drive,     |  13 Roberts Road +  Christ can be your backup.        |  Newtown Square,
Pennsylvania19073
 


Re: logical column position

From
Rod Taylor
Date:
On Mon, 2003-11-17 at 20:24, Christopher Kings-Lynne wrote:
> > Right -- AFAICS, the only change in COPY compatibility would be if you
> > COPY TO'd a table and then changed the logical column order in some
> > fashion, you would no longer be able to restore the dump (unless you
> > specified a column list for the COPY FROM -- which, btw, pg_dump
> > does). I don't think it will be a problem, I just thought I'd mention
> > it.
> 
> Well it's the same problem as if you'd dropped a column in the middle of 
> the table.
> 
> BTW, one main consideration is that all the postgres admin apps will now 
> need to support ORDER BY attlognum for 7.5+.

Yeah... how about maintaining attnum for the logical attribute number
and create an attphysnum or something for the physical position instead?

This is more intrusive into the source, but you don't need to teach new
tricks to external entities.



Re: logical column position

From
Peter Eisentraut
Date:
Christopher Kings-Lynne writes:

> BTW, one main consideration is that all the postgres admin apps will now
> need to support ORDER BY attlognum for 7.5+.

But that is only really important if they've also used the ALTER TABLE
RESHUFFLE COLUMNS feature.  So if they make one alteration for 7.5, they
need to do another.  That seems fair.

-- 
Peter Eisentraut   peter_e@gmx.net



Re: logical column position

From
Christopher Kings-Lynne
Date:
>>BTW, one main consideration is that all the postgres admin apps will now
>>need to support ORDER BY attlognum for 7.5+.
> 
> 
> But that is only really important if they've also used the ALTER TABLE
> RESHUFFLE COLUMNS feature.  So if they make one alteration for 7.5, they
> need to do another.  That seems fair.

Good point.

Chris




Re: logical column position

From
Stephan Szabo
Date:
On Tue, 18 Nov 2003, Peter Eisentraut wrote:

> Christopher Kings-Lynne writes:
>
> > BTW, one main consideration is that all the postgres admin apps will now
> > need to support ORDER BY attlognum for 7.5+.
>
> But that is only really important if they've also used the ALTER TABLE
> RESHUFFLE COLUMNS feature.  So if they make one alteration for 7.5, they
> need to do another.  That seems fair.

The ability to reshuffle and to get the correct ordering in a client app
are separate unless we're going to assume that all access goes through
that particular client.  If one user uses psql and shuffles them, a
second user using fooclient may not see the new ordering.


Re: logical column position

From
Dave Cramer
Date:
Will adding the logical attribute number break all of the external
tools? pg_dump, etc are all dependent on attnum now?

Would it be possible to keep the meaning of attnum the same externally
and add another column internally to represent the physical number?

Dave



Re: logical column position

From
Bruce Momjian
Date:
Dave Cramer wrote:
> Will adding the logical attribute number break all of the external
> tools? pg_dump, etc are all dependent on attnum now?
> 
> Would it be possible to keep the meaning of attnum the same externally
> and add another column internally to represent the physical number?

Interesting idea.  It would require a lot of code renaming in the
backend, but it could be done.

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
359-1001+  If your life is a hard drive,     |  13 Roberts Road +  Christ can be your backup.        |  Newtown Square,
Pennsylvania19073
 


Re: logical column position

From
Christopher Kings-Lynne
Date:
>>Will adding the logical attribute number break all of the external
>>tools? pg_dump, etc are all dependent on attnum now?
>>
>>Would it be possible to keep the meaning of attnum the same externally
>>and add another column internally to represent the physical number?
> 
> 
> Interesting idea.  It would require a lot of code renaming in the
> backend, but it could be done.

Given that the interfaces comprise pgadmin, phppgadmin, jdbc and odbc 
and all the main developers for those lists read all these posts, I 
think the massive amount of effort to maintain the external interface 
isn't worth it.

I can vouch that it would take me exactly 2 minutes to add support for 
attlognums in phpPgAdmin.

Chris




Re: logical column position

From
Andreas Pflug
Date:
Christopher Kings-Lynne wrote:

>
>>> Will adding the logical attribute number break all of the external
>>> tools? pg_dump, etc are all dependent on attnum now?
>>>
>>> Would it be possible to keep the meaning of attnum the same externally
>>> and add another column internally to represent the physical number?
>>
>>
>>
>> Interesting idea.  It would require a lot of code renaming in the
>> backend, but it could be done.
>
>
> Given that the interfaces comprise pgadmin, phppgadmin, jdbc and odbc 
> and all the main developers for those lists read all these posts, I 
> think the massive amount of effort to maintain the external interface 
> isn't worth it.
>
> I can vouch that it would take me exactly 2 minutes to add support for 
> attlognums in phpPgAdmin.

Lesson 1 in "SQL for beginners" says "Don't use SELECT * if you rely on 
the order of columns". This discussion is about fixing a problem that 
only inexperienced programmers have. It's like an extra set of wheels on 
top of your car, just in case you drive wrong way...

What's happening if you simply delete a column? Ordering 1,2,3,5? Insert 
another column of the same name, as a previously deleted, will it get 
the old position number? And so on. IMHO, way too much effort for 
working around situations that should be avoided anyway.

Regards,
Andreas




Re: logical column position

From
Dave Cramer
Date:
Andreas,

The point of this is to maintain the column position. I don't think that
an alter of a column type should move the column position. It may be
that programmers should not rely on this, but it happens, and in very
large projects. If we can avoid unexpected side-affects like moving the
columns position, then I think we should.

Dave


On Wed, 2003-11-19 at 12:30, Andreas Pflug wrote:
> Christopher Kings-Lynne wrote:
> 
> >
> >>> Will adding the logical attribute number break all of the external
> >>> tools? pg_dump, etc are all dependent on attnum now?
> >>>
> >>> Would it be possible to keep the meaning of attnum the same externally
> >>> and add another column internally to represent the physical number?
> >>
> >>
> >>
> >> Interesting idea.  It would require a lot of code renaming in the
> >> backend, but it could be done.
> >
> >
> > Given that the interfaces comprise pgadmin, phppgadmin, jdbc and odbc 
> > and all the main developers for those lists read all these posts, I 
> > think the massive amount of effort to maintain the external interface 
> > isn't worth it.
> >
> > I can vouch that it would take me exactly 2 minutes to add support for 
> > attlognums in phpPgAdmin.
> 
> Lesson 1 in "SQL for beginners" says "Don't use SELECT * if you rely on 
> the order of columns". This discussion is about fixing a problem that 
> only inexperienced programmers have. It's like an extra set of wheels on 
> top of your car, just in case you drive wrong way...
> 
> What's happening if you simply delete a column? Ordering 1,2,3,5? Insert 
> another column of the same name, as a previously deleted, will it get 
> the old position number? And so on. IMHO, way too much effort for 
> working around situations that should be avoided anyway.
> 
> Regards,
> Andreas
> 
> 
> 



Re: logical column position

From
Andreas Pflug
Date:
Dave Cramer wrote:

>Andreas,
>
>The point of this is to maintain the column position. I don't think that
>an alter of a column type should move the column position.
>
Why should ALTER COLUMN change the column number, i.e. position?

>It may be that programmers should not rely on this, but it happens, and in very
>large projects. If we can avoid unexpected side-affects like moving the
>columns position, then I think we should.
>  
>
This is *expected* if behaviour if you delete and add columns; is there 
any DB system out there that allows to reshuffle the column ordering?

Instead of some order-ordering facility it would be better to support 
all kinds of column type changes, not only binary compatible ones. This 
would help everybody, not only maintainers of ill-designed software.

Regards,
Andreas






Re: logical column position

From
Dave Cramer
Date:
Andreas,


On Wed, 2003-11-19 at 13:07, Andreas Pflug wrote:
> Dave Cramer wrote:
> 
> >Andreas,
> >
> >The point of this is to maintain the column position. I don't think that
> >an alter of a column type should move the column position.
> >
> Why should ALTER COLUMN change the column number, i.e. position?

Rod's current proposed patch does that if you do an alter column alter
type. This is an artifact of the underlying mechanism. (ren old col, add
new col, update newcol=oldcol::newtype). Which is the point of the
logical column number discussion, and the todo item.

> 
> >It may be that programmers should not rely on this, but it happens, and in very
> >large projects. If we can avoid unexpected side-affects like moving the
> >columns position, then I think we should.
> >  
> >
> This is *expected* if behaviour if you delete and add columns; is there 
> any DB system out there that allows to reshuffle the column ordering?

Yes, informix allows you to add the column before|after a column, and
mysql allows for add column after col. those are the only two I know
about.there could be more.
> 
> Instead of some order-ordering facility it would be better to support 
> all kinds of column type changes, not only binary compatible ones. This 
> would help everybody, not only maintainers of ill-designed software.
> 
> Regards,
> Andreas
> 
> 
> 
> 
> 



Re: logical column position

From
Andreas Pflug
Date:
Dave Cramer wrote:

>Andreas,
>
>
>On Wed, 2003-11-19 at 13:07, Andreas Pflug wrote:
>  
>
>>Dave Cramer wrote:
>>
>>    
>>
>>>Andreas,
>>>
>>>The point of this is to maintain the column position. I don't think that
>>>an alter of a column type should move the column position.
>>>
>>>      
>>>
>>Why should ALTER COLUMN change the column number, i.e. position?
>>    
>>
>
>Rod's current proposed patch does that if you do an alter column alter
>type. This is an artifact of the underlying mechanism. (ren old col, add
>new col, update newcol=oldcol::newtype). 
>
I must have missed that, can't find it in hackers?!?
In this case the old attnum value should simply be reused, to retain the 
original ordering. IMHO this is necessary to prevent problems with any 
object referencing a column (index, view, ...) The current proposal 
invents the attpos for column ordering purpose only, but 
views/indexes/etc will reference attnum, and would need updates.

Actually, a column that changes its attnum is just like a table changing 
its oid, i.e. it's not the same object any more. This will provoke 
problems in administration tools (at least in pgAdmin3, which will try 
to refresh its display with the formerly known oid/attnum af ter 
executing a change), and maybe other places too.

To put it differently: a ALTER COLUMN command may never-ever change the 
identifier of the column, i.e. attrelid/attnum.

Regards,
Andreas



Re: logical column position

From
Hannu Krosing
Date:
Andreas Pflug kirjutas K, 19.11.2003 kell 20:45:
> Dave Cramer wrote:
> >>Why should ALTER COLUMN change the column number, i.e. position?
> >
> >Rod's current proposed patch does that if you do an alter column alter
> >type. This is an artifact of the underlying mechanism. (ren old col, add
> >new col, update newcol=oldcol::newtype). 
> >
> I must have missed that, can't find it in hackers?!?

Was on [PATCHES] IIRC.

> In this case the old attnum value should simply be reused, to retain the 
> original ordering. IMHO this is necessary to prevent problems with any 
> object referencing a column (index, view, ...) 

Actually these have to be recreaqted, especially when changing column
type. 

Rod's patchs does that too ;)

> The current proposal 
> invents the attpos for column ordering purpose only,

That's the only place _user_ sees it. The other uses are taken care of
inide database backend.

>  but 
> views/indexes/etc will reference attnum, and would need updates.

they also "reference" column type, and thus need to be updated anyway
when column type changes.

> Actually, a column that changes its attnum is just like a table changing 
> its oid, i.e. it's not the same object any more. This will provoke 
> problems in administration tools (at least in pgAdmin3, which will try 
> to refresh its display with the formerly known oid/attnum af ter 
> executing a change), and maybe other places too.

Sure. _any_ change to database structure could break a client not
(designed to be) aware of that change.

> To put it differently: a ALTER COLUMN command may never-ever change the 
> identifier of the column, i.e. attrelid/attnum.

to be even more restirictive: ALTER COLUMN may never-ever change the
type of the column, as this too may break some apps. Nah!

-------------
Hannu











ALTER COLUMN/logical column position

From
Andreas Pflug
Date:
Hannu Krosing wrote:

>>To put it differently: a ALTER COLUMN command may never-ever change the 
>>identifier of the column, i.e. attrelid/attnum.
>>    
>>
>
>to be even more restirictive: ALTER COLUMN may never-ever change the
>type of the column, as this too may break some apps. Nah!
>
>  
>
Yeah, and the data should be read only :-)

Seriously: Methinks that only a part of the -patches thread was turned 
over to -hackers, some important parts are missing.

First, there are column type changes that don't need any 
index/view/constraint recheck or data transformation at all, being of 
the very popular class "hell, I need to stuff 12 bytes in my 
varchar(10)". Some months ago, this was discussed, and there was 
consense that binarily compatible types may be changed with few special 
precautions (e.g. varchar(12) -> varchar(10) e.g. needs a check for 
len<=10). As a consequence, this kind of column type change is 
implemented in pgAdmin3".

Probably a large percentage of real life column type changes are such 
binarily compatible ones, so it's senseful to handle them separately.

Second, column type changes needing a nontrivial cast function should be 
implemented in a way that preserve attnum. This could be done like this:
- decompile dependent objects, and memorize them for later recreation
- ADD tmpCol, UPDATE tmpCol=col::newtype, DROP old column, cascading to 
dependent objects, RENAME tmpCol (known stuff)
- restore old attnum, which is a simple UPDATE to pg_attribute at this stage
- recreate all dependent objects

Voila! No need for an additional attpos.

Regards,
Andreas




Re: logical column position

From
Christopher Kings-Lynne
Date:
> Why should ALTER COLUMN change the column number, i.e. position?

Because it creates a NEW column.

>> It may be that programmers should not rely on this, but it happens, 
>> and in very
>> large projects. If we can avoid unexpected side-affects like moving the
>> columns position, then I think we should.
>>  
>>
> This is *expected* if behaviour if you delete and add columns; is there 
> any DB system out there that allows to reshuffle the column ordering?

MySQL

Chris




Re: ALTER COLUMN/logical column position

From
Hannu Krosing
Date:
Andreas Pflug kirjutas N, 20.11.2003 kell 01:38:

> Second, column type changes needing a nontrivial cast function should be 
> implemented in a way that preserve attnum. This could be done like this:
> - decompile dependent objects, and memorize them for later recreation
> - ADD tmpCol, UPDATE tmpCol=col::newtype, DROP old column, cascading to 
> dependent objects, RENAME tmpCol (known stuff)
> - restore old attnum, which is a simple UPDATE to pg_attribute at this stage

and suddenly your table is broken, as you can't retrieve the tmpCol when
the attnum points to the dropped old column which has data in the format
for old type ...

the whole point of separating attnum and attpos is that attnum is used
internally to retrieve the data and you can't change it by just
UPDATEing pg_attribute.

---------------
Hannu



Re: ALTER COLUMN/logical column position

From
Andreas Pflug
Date:
Hannu Krosing wrote:

>
>attnum is used internally to retrieve the data 
>
>  
>
Oops...

So if an additional column number is invented, it should not be a 
logical column number, but a physical storage number for internal data 
retrieval. This way, the "user interface" doesn't change, and all those 
"SELECT ... FROM pg_attribute ORDER BY attnum" continue delivering the 
expected result.

Regards,
Andreas



Re: ALTER COLUMN/logical column position

From
Hannu Krosing
Date:
Andreas Pflug kirjutas N, 20.11.2003 kell 11:40:
> Hannu Krosing wrote:
> 
> >
> >attnum is used internally to retrieve the data 
> >
> >  
> >
> Oops...
> 
> So if an additional column number is invented, it should not be a 
> logical column number, but a physical storage number for internal data 
> retrieval. 

You are just shifting the interface problems to a place needing way more
changes in the backend. There will be some problems either way.

also, tools needing knowledge should start using information schema as
much as they can, making internal reshufflings less of a problem.

> This way, the "user interface" doesn't change, and all those 
> "SELECT ... FROM pg_attribute ORDER BY attnum" continue delivering the 
> expected result.

Depending on what you expect ;)

If you expect the above to give you all active columns as orderd as they
are stored, then it does not give you what you expect.

Btw, most of these concerns (and more) were already iterated when DROP
column was done causing gaps in attnum. There were a lot of doomsday
profecies, but in the end it went quite smoothly. The tools needing
internal knowledge about storage (meaning any tool doing select .. from
pg_...) have always needed some upgrades for new verions.

IMHO, The only behaviour visible to common user we should worry about is
SELECT * , and a special column for solving this is _the_ easiest way to
do it.

-------------
Hannu



Re: ALTER COLUMN/logical column position

From
Dave Cramer
Date:
Which is what started the whole discussion.

Dave
On Thu, 2003-11-20 at 04:40, Andreas Pflug wrote:
> Hannu Krosing wrote:
> 
> >
> >attnum is used internally to retrieve the data 
> >
> >  
> >
> Oops...
> 
> So if an additional column number is invented, it should not be a 
> logical column number, but a physical storage number for internal data 
> retrieval. This way, the "user interface" doesn't change, and all those 
> "SELECT ... FROM pg_attribute ORDER BY attnum" continue delivering the 
> expected result.
> 
> Regards,
> Andreas
> 
> 



Re: logical column position

From
Manfred Koizar
Date:
On Wed, 19 Nov 2003 19:07:23 +0100, Andreas Pflug
<pgadmin@pse-consulting.de> wrote:
>is there 
>any DB system out there that allows to reshuffle the column ordering?

Firebird:ALTER TABLE tname ALTER COLUMN cname POSITION 7;

ServusManfred


Re: ALTER COLUMN/logical column position

From
Tom Lane
Date:
Hannu Krosing <hannu@tm.ee> writes:
> You are just shifting the interface problems to a place needing way more
> changes in the backend. There will be some problems either way.

Exactly.  I'm considerably more worried about breaking out-of-the-way
places in the backend than I am about what order someone's admin tool
presents the columns in.

> Btw, most of these concerns (and more) were already iterated when DROP
> column was done causing gaps in attnum. There were a lot of doomsday
> profecies, but in the end it went quite smoothly.

That is a good comparison point.  I'm inclined to think that we should
do it in a way that minimizes backend changes.  The way to do that is
to keep attnum with its current definition (physical position) and add
a new column for the logical position, which only a small number of
places will need to care about.
        regards, tom lane


Re: ALTER COLUMN/logical column position

From
Peter Eisentraut
Date:
Tom Lane writes:

> Exactly.  I'm considerably more worried about breaking out-of-the-way
> places in the backend than I am about what order someone's admin tool
> presents the columns in.

Clearly, the effort of adding logical column numbers will consist of
making choices between physical and logical numbers in the backend in some
places.  So one option is to replace some uses of attnum by attlognum.
The other optionis to replace *all* uses of attnum by attphysnum and then
replace some uses of attphysnum by attnum.  To me, this looks like an
equal "risk" as far as the backend goes.

-- 
Peter Eisentraut   peter_e@gmx.net



Re: ALTER COLUMN/logical column position

From
Tom Lane
Date:
Peter Eisentraut <peter_e@gmx.net> writes:
> Tom Lane writes:
>> Exactly.  I'm considerably more worried about breaking out-of-the-way
>> places in the backend than I am about what order someone's admin tool
>> presents the columns in.

> Clearly, the effort of adding logical column numbers will consist of
> making choices between physical and logical numbers in the backend in some
> places.  So one option is to replace some uses of attnum by attlognum.
> The other optionis to replace *all* uses of attnum by attphysnum and then
> replace some uses of attphysnum by attnum.  To me, this looks like an
> equal "risk" as far as the backend goes.

This would be a reasonable assessment if we had our hands on every line
of backend code that exists.  But you are neglecting the probability of
breaking user-written C functions, PL languages outside the main distro,
etc.

If we were going to go about this in a way that does not localize the
changes, I'd be inclined to use "attlognum" and "attphysnum" ... that
is, *deliberately* break every use that hasn't been looked at and
updated.  Even that would not guarantee catching all the trouble spots;
for example loop indexes and attnums passed as function parameters might
not have names that would be caught by a simplistic search-and-replace
update.

I'm for localizing the changes.
        regards, tom lane


Re: logical column position

From
Tom Lane
Date:
Rod Taylor <pg@rbt.ca> writes:
> On Mon, 2003-11-17 at 20:24, Christopher Kings-Lynne wrote:
>> BTW, one main consideration is that all the postgres admin apps will now 
>> need to support ORDER BY attlognum for 7.5+.

> Yeah... how about maintaining attnum for the logical attribute number
> and create an attphysnum or something for the physical position instead?
> This is more intrusive into the source, but you don't need to teach new
> tricks to external entities.
> [ and similar remarks from other people elsewhere in the thread ]

It's completely fallacious to imagine that we could make this change be
transparent to external applications.  To take two examples:

1. How many places do you think know that pg_attribute.attnum links to
pg_attrdef.adnum?  pg_dump, psql, and the JDBC driver all appear to
know that, in a quick search of the CVS tree; I haven't even bothered to
look at pgadmin and the other apps that are likely to have such
dependencies.

2. How about linking pg_attribute.attnum to entries in pg_index.indkey?
Lots of apps know about that too.

Unless you are going to change the meanings of pg_index.indkey and
pg_attrdef.adnum, you can't simply redefine attnum as a logical column
position.  And if you do make such a change you will break code
elsewhere.

If we add a *new* column attlogpos, without changing the semantics
of attnum, then I think we have a fighting chance of making this work
without an undue amount of effort.  I see no prospect that we can
change the meaning of attnum without breaking things far and wide.
        regards, tom lane


Re: logical column position

From
Andreas Pflug
Date:
Tom Lane wrote:

>It's completely fallacious to imagine that we could make this change be
>transparent to external applications.  To take two examples:
>
>1. How many places do you think know that pg_attribute.attnum links to
>pg_attrdef.adnum?  pg_dump, psql, and the JDBC driver all appear to
>know that, in a quick search of the CVS tree; I haven't even bothered to
>look at pgadmin and the other apps that are likely to have such
>dependencies.
>
>2. How about linking pg_attribute.attnum to entries in pg_index.indkey?
>Lots of apps know about that too.
>
>Unless you are going to change the meanings of pg_index.indkey and
>pg_attrdef.adnum, you can't simply redefine attnum as a logical column
>position.  And if you do make such a change you will break code
>elsewhere.
>
>If we add a *new* column attlogpos, without changing the semantics
>of attnum, then I think we have a fighting chance of making this work
>without an undue amount of effort.  I see no prospect that we can
>change the meaning of attnum without breaking things far and wide.
>
>  
>

I don't quite understand your argumentation.
Currently, attnum is used
1) to determine position (the concern)
2) as part of the unique identifier, as used by index, FK etc
3) as pointer for data retrieval.

If only the retrieval functions would use attstoragenum or however you'd 
call it, all other references to attnum can remain untouched. Actual 
physical reordering would be hidden almost completely. This is a bit 
like abstracting a primary key from the row's physical storage location.


Regards,
Andreas

>            regards, tom lane
>
>---------------------------(end of broadcast)---------------------------
>TIP 8: explain analyze is your friend
>
>  
>




Re: logical column position

From
Tom Lane
Date:
Andreas Pflug <pgadmin@pse-consulting.de> writes:
> I don't quite understand your argumentation.

My point is that to change attnum into a logical position without
breaking client apps (which is the ostensible reason for doing it
that way), we would need to redefine all system catalog entries that
reference columns by attnum so that they also store logical rather than
physical position.  That has a number of serious problems, one big one
being the difficulty of updating them all correctly during a column
renumbering operation.  More, it turns what would otherwise be a
relatively localized patch into a massive and bug-prone backend
modification.

I think it is better to consider attnum as sort of a mini-OID: any one
column has a uniquely assigned attnum that will never change and can
be relied on to identify that column.  This is essentially how it is
being used now (remember attnum is part of the PK for pg_attribute)
and the fact that it is also the physical position is really rather
incidental as far as the system catalogs are concerned.

You're quite right that attnum is serving three purposes, but that
doesn't mean that we can choose at random which purpose(s) to decouple.
Abandoning the assumption that attnum is a permanent identifier would
break a lot of things --- probably not only in the backend, either.
        regards, tom lane


Re: logical column position

From
Andreas Pflug
Date:
Tom Lane wrote:

>Andreas Pflug <pgadmin@pse-consulting.de> writes:
>  
>
>>I don't quite understand your argumentation.
>>    
>>
>
>My point is that to change attnum into a logical position without
>breaking client apps (which is the ostensible reason for doing it
>that way), we would need to redefine all system catalog entries that
>reference columns by attnum so that they also store logical rather than
>physical position.  That has a number of serious problems, one big one
>being the difficulty of updating them all correctly during a column
>renumbering operation.  More, it turns what would otherwise be a
>relatively localized patch into a massive and bug-prone backend
>modification.
>
>I think it is better to consider attnum as sort of a mini-OID: any one
>column has a uniquely assigned attnum that will never change and can
>be relied on to identify that column.  This is essentially how it is
>being used now (remember attnum is part of the PK for pg_attribute)
>and the fact that it is also the physical position is really rather
>incidental as far as the system catalogs are concerned.
>  
>
I agree considering attrelid/attnum as kind-of OID, but a relation's 
pg_class.oid won't change at ALTER TABLE either, I'd expect the same 
from ALTER COLUMN.

>You're quite right that attnum is serving three purposes, but that
>doesn't mean that we can choose at random which purpose(s) to decouple.
>Abandoning the assumption that attnum is a permanent identifier would
>break a lot of things --- probably not only in the backend, either.
>  
>

Maybe my proposal wasn't clear enough:
Just as an index references a pg_class entry by it's OID, not some value 
identifying it's physical storage, all objects might continue 
referencing columns by attnum. Only tuple handling functions like 
heap_getattr and heap_formtuple need to know how to extract a Datum by 
its attnum from a HeapTuple or how to compile a HeapTuple correctly. If 
reshuffling columns is done inside of these functions, it would be 
transparent to the rest of the backend and the clients. Hopefully, there 
are not too much of such functions, or fancy modules bypassing them...


Regards,
Andreas




Re: logical column position

From
Tom Lane
Date:
Andreas Pflug <pgadmin@pse-consulting.de> writes:
> Maybe my proposal wasn't clear enough:
> Just as an index references a pg_class entry by it's OID, not some value 
> identifying it's physical storage, all objects might continue 
> referencing columns by attnum.

That's exactly the same thing I am saying.  Your mistake is to assume
that this function can be combined with identification of a (changeable)
logical column position.  It can't.  Changeability and immutability are
just not compatible requirements.
        regards, tom lane


Re: logical column position

From
Andreas Pflug
Date:
Tom Lane wrote:

>Andreas Pflug <pgadmin@pse-consulting.de> writes:
>  
>
>>Maybe my proposal wasn't clear enough:
>>Just as an index references a pg_class entry by it's OID, not some value 
>>identifying it's physical storage, all objects might continue 
>>referencing columns by attnum.
>>    
>>
>
>That's exactly the same thing I am saying.  Your mistake is to assume
>that this function can be combined with identification of a (changeable)
>logical column position.  It can't.  Changeability and immutability are
>just not compatible requirements.
>  
>
In the mind of a programmer, a ALTER COLUMN doesn't create a new column, 
but merely changes some attributes of an existing column. In this sense, 
changeability and immutability are not controversal.

Digging deeper:

TupDesc contains an array of physical attr descriptions, and to access a 
column description attnum is taken as index into that array (taken from 
fastgetattr).
   return fetchatt(tupleDesc->attrs[attnum-1], ...)

The physical location can easily reordered if there's an additional 
array, to translate attnum into the array index.
   return fetchatt(tupleDesc->attrs[tupleDesc->attrpos[attnum-1]] ...

For sure, reordering (i.e. changing the attrpos array) may only be 
performed as long as the column isn't referenced.

Regards,
Andreas




Re: logical column position

From
Tom Lane
Date:
Andreas Pflug <pgadmin@pse-consulting.de> writes:
> To put it differently: a ALTER COLUMN command may never-ever change the 
> identifier of the column, i.e. attrelid/attnum.

If the ALTER is changing the column type, it's not really the same
column anymore; I see nothing wrong with assigning a new attnum in that
scenario.  It's not like you can simply change the type and not go visit
the references in such a case.
        regards, tom lane


Re: logical column position

From
Andreas Pflug
Date:
Tom Lane wrote:

>If the ALTER is changing the column type, it's not really the same
>column anymore;
>

This doesn't strike. "If the ALTER is changing the number of columns, 
it's not really the same table anymore" is as true as your statement. 
Still, pg_class.oid remains the same for ADD and DROP column.

> I see nothing wrong with assigning a new attnum in that
>scenario.  It's not like you can simply change the type and not go visit
>the references in such a case.
>  
>

But this fix is about automatically updating references as well, making 
the ALTER COLUMN appear a low-impact change to the user (which obviously 
isn't true, unless my proposed shortcut for binary compatible type 
changes is implemented).

When dropping and recreating an object, nobody would expect to get the 
same identifier. When altering, I *do* expect the identifier to remain 
the same.

Regards,
Andreas




Re: ALTER COLUMN/logical column position

From
Andreas Pflug
Date:
Hannu Krosing wrote:

>
>
>You are just shifting the interface problems to a place needing way more
>changes in the backend. There will be some problems either way.
>  
>
Not quite. Certainly, basing internal storage on attstoragenum is more 
work in the backend, but less (precisely: zero) work on an unknown 
number of frontend tools and apps.

>also, tools needing knowledge should start using information schema as
>much as they can, making internal reshufflings less of a problem.
>  
>
We had this discussion. information_schema doesn't deliver enough info 
needed for admin tools.

>  
>
>>This way, the "user interface" doesn't change, and all those 
>>"SELECT ... FROM pg_attribute ORDER BY attnum" continue delivering the 
>>expected result.
>>    
>>
>
>Depending on what you expect ;)
>  
>
Usually, nobody should care about the column ordering, but for those 
unfortunate guys that rely on a specific SELECT * ordering the list of 
columns displayed in admin tools must show that ordering; this is what 
current admin tools expect from attnum. No SQL user would ever care 
about internal storage details/pointers/counters, so any admin tool 
would need to ORDER BY CASE WHEN version>=7.5 THEN attpos ELSE attnum 
END (and the unique key to pg_attribute, as seen from the tool, changes 
from refoid/attnum to refoid/attindex too).

>If you expect the above to give you all active columns as orderd as they
>are stored, then it does not give you what you expect.
>
>Btw, most of these concerns (and more) were already iterated when DROP
>column was done causing gaps in attnum. There were a lot of doomsday
>profecies, but in the end it went quite smoothly.
>
I don't bother about missing attnum values, even 1,2,3,5,6,8 is nicely 
ordered.

> The tools needing
>internal knowledge about storage (meaning any tool doing select .. from
>pg_...) have always needed some upgrades for new verions.
>  
>
Yes, but changes to pg_... should retain the usual meanings as much as 
possible, so older tools continue to work. The discussed change is 
problematic because old tools *seem* to work ok, but their attnum 
interpretation would be wrong.

>IMHO, The only behaviour visible to common user we should worry about is
>SELECT * , and a special column for solving this is _the_ easiest way to
>do it.
>
>  
>
Surely this is the easiest way. But it has the biggest impact on clients 
too. I'm just imagining what would happen to pgAdmin3. The column number 
would have to display attpos (this is what the user is interested in to 
see the ordering), while index, FK and so forth will continue to display 
attnum. This seems quite unwanted to me.


---
Are there any comments on the proposed lean way to alter columns for 
trivial type changes?

Regards,
Andreas



Re: ALTER COLUMN/logical column position

From
Andreas Pflug
Date:

>>We had this discussion. information_schema doesn't deliver enough info 
>>needed for admin tools.
>>    
>>
>
>It should. This is the sole reason for existance of it. If it is not
>enough, then it should be updated.
>  
>
It can't. ANSI says only objects owned by the user are shown. Admins 
might be quite unhappy about that...
pg_catalog views don't help either, just look at pg_tables. It doesn't 
even have the oid, how should a table be identified uniquely?
The system views are unusable for hardcore admin purposes, until they 
include *. So I'd rather use the tables directly.

>>
>>Usually, nobody should care about the column ordering, but for those 
>>unfortunate guys that rely on a specific SELECT * ordering the list of 
>>columns displayed in admin tools must show that ordering; this is what 
>>current admin tools expect from attnum. No SQL user would ever care 
>>about internal storage details/pointers/counters, so any admin tool 
>>would need to ORDER BY CASE WHEN version>=7.5 THEN attpos ELSE attnum 
>>END
>>    
>>
>
>This won't work anyway if table is missing column attpos .
>
Sorry to be not precise enough, this was meant as meta code. Of course 
the query must be built version dependent.

>Still there were several predictions of all admin tools breaking as a
>result of gaps.
>  
>
wasn' me.

>
>Can you name one PG version change from A.N to A.M where old admin tools
>have not needed any changes ?
>  
>
Older tools usually continue to work, they just don't know new features. 
Maybe some esotheric features break, so few people notice.

>attnum interpretation of pgAdmin3 is already wrong - it claims it to be
>Position even when some previous columns are dropped. So you can have a
>table which has 1 column with "Position" 3 ;)
>  
>
attnum isn't interpreted at all in pgAdmin3, only used for ordering. It 
can't be used as key to a column any more, if altering a column would 
create a new attnum with old name (and old attpos). In this sense, the 
key is attrelid/attpos, because only these don't change for an ALTER 
COLUMN statement.

Imagine a sql update to a non-pk column would change the pk of the row, 
so you'd have to reread the row by its content to obtain the new pk 
value. This wouldn't make you happy, right? Same here.

>  
>
>>>IMHO, The only behaviour visible to common user we should worry about is
>>>SELECT * , and a special column for solving this is _the_ easiest way to
>>>do it.
>>>
>>>      
>>>
>>Surely this is the easiest way. But it has the biggest impact on clients 
>>too. 
>>    
>>
>
>Not all clients - just admin tools. And we dont have that many admin
>tools. And IMNSHO admin tools should move to using information_schema as
>much as possible.
>
>And if information_schema is inadequate then fix it instead of bypassing
>it.
>  
>
It is unfixable. Being ANSI-conformant, it prevents the admin seeing 
everything.

>I think keeping know-how about retrieving postgresql structure inside of
>an application instead of making it readily available in
>information_schema is either
> a) an egoistic attemt of shutting out competition
> b) laziness
>or
> c) both
>
><grin>
>
>  
>
<comment mode=suppress> ******** </comment>

>It is also bad design, as it ties ones tool to backend structure too
>tightly. 
>
>Backend structure will keep changing and the last thing we want to hold
>it back is some frontend tool which thinks it knows better how to
>organize data in backend. What if some completely new storage is added
>to postgreSQL (ancient Postgres versions had built-in support for
>several storages). Should all frontend tools (including ?DBC drivers)
>need updating or just information_schema ?
>  
>

Again, I'm not against using information_schema. I tried to use it (for 
non-admin tool purposes!) and failed badly. pgAdmin3 is a tool for *all* 
PostgreSQL features, not just for some common ANSI stuff.

>>I'm just imagining what would happen to pgAdmin3. The column number 
>>would have to display attpos (this is what the user is interested in to 
>>see the ordering),
>>    
>>
>
>No they are interested in position as you mentioned above, they didn't
>want to see attnum (i.e 1,2,3,5,6,8 in your example) before either.
>
>I think it is a bug that pgAdmin3 shows attnum instead the real
>position.
>
>  
>
It shows what's in the db, to identify the column. I can't imagine why I 
should show a column number, programmers should address by column name 
and nothing else.

>Sorry, I must have missed it ;(  could you give a link to archived copy.
>  
>
About a week ago, when the message you answered on was posted. Took a 
week now to appear...

>But I think that nobody objected, but nobody didn't volunteer to do the
>work either ;)
>
>At least that was the impression i got from an answer to my similar
>question on growing varchars and dropping isnull's without forcing
>column copies and constraint checks.
>  
>

Yeah, interesting. For my observations, these trivial changes make 90-95 
% of daily column change work, that's why I implemented it in pgAdmin3 
(targeting the system tables directly...), so it's worth the effort 
handling them separately. I might add it some time to the backend (as an 
additional code path to the big version).

Regards,
Andreas




Re: ALTER COLUMN/logical column position

From
Hannu Krosing
Date:
Andreas Pflug kirjutas N, 20.11.2003 kell 16:10:
> Hannu Krosing wrote:
> 
> >
> >
> >You are just shifting the interface problems to a place needing way more
> >changes in the backend. There will be some problems either way.
> >  
> >
> Not quite. Certainly, basing internal storage on attstoragenum is more 
> work in the backend, but less (precisely: zero) work on an unknown 
> number of frontend tools and apps.

With stress on "unknown number" ;)

> >also, tools needing knowledge should start using information schema as
> >much as they can, making internal reshufflings less of a problem.
> >  
> >
> We had this discussion. information_schema doesn't deliver enough info 
> needed for admin tools.

It should. This is the sole reason for existance of it. If it is not
enough, then it should be updated.

Updating information_schema would also make developers of other admin
tools happy. Remember - competition is good ;)

Compatibility with old verions of admin tools wont happen anyway, so we
should not let that lock backend into bad development decisions. Look
what happened to DOS-WIN16-WIN32.

> >>This way, the "user interface" doesn't change, and all those 
> >>"SELECT ... FROM pg_attribute ORDER BY attnum" continue delivering the 
> >>expected result.
> >>    
> >>
> >
> >Depending on what you expect ;)
> >  
> >
> Usually, nobody should care about the column ordering, but for those 
> unfortunate guys that rely on a specific SELECT * ordering the list of 
> columns displayed in admin tools must show that ordering; this is what 
> current admin tools expect from attnum. No SQL user would ever care 
> about internal storage details/pointers/counters, so any admin tool 
> would need to ORDER BY CASE WHEN version>=7.5 THEN attpos ELSE attnum 
> END

This won't work anyway if table is missing column attpos . You have to
have different queries for different versions. Add it is preferential to
keep these different queries in information_schema of corresponding
databases not all in frontend tool.

You still need different queries for old databases which did not support
schemas.

>  (and the unique key to pg_attribute, as seen from the tool, changes 
> from refoid/attnum to refoid/attindex too).

the key needs no change, just the ORDER BY clause.

> >If you expect the above to give you all active columns as orderd as they
> >are stored, then it does not give you what you expect.
> >
> >Btw, most of these concerns (and more) were already iterated when DROP
> >column was done causing gaps in attnum. There were a lot of doomsday
> >profecies, but in the end it went quite smoothly.
> >
> I don't bother about missing attnum values, even 1,2,3,5,6,8 is nicely 
> ordered.

Still there were several predictions of all admin tools breaking as a
result of gaps.

> > The tools needing
> >internal knowledge about storage (meaning any tool doing select .. from
> >pg_...) have always needed some upgrades for new verions.
> >  
> >
> Yes, but changes to pg_... should retain the usual meanings as much as 
> possible, so older tools continue to work. 

Can you name one PG version change from A.N to A.M where old admin tools
have not needed any changes ?

> The discussed change is 
> problematic because old tools *seem* to work ok, but their attnum 
> interpretation would be wrong.

attnum interpretation of pgAdmin3 is already wrong - it claims it to be
Position even when some previous columns are dropped. So you can have a
table which has 1 column with "Position" 3 ;)

> >IMHO, The only behaviour visible to common user we should worry about is
> >SELECT * , and a special column for solving this is _the_ easiest way to
> >do it.
> >
> Surely this is the easiest way. But it has the biggest impact on clients 
> too. 

Not all clients - just admin tools. And we dont have that many admin
tools. And IMNSHO admin tools should move to using information_schema as
much as possible.

And if information_schema is inadequate then fix it instead of bypassing
it.

I think keeping know-how about retrieving postgresql structure inside of
an application instead of making it readily available in
information_schema is eithera) an egoistic attemt of shutting out competitionb) laziness
orc) both

<grin>

It is also bad design, as it ties ones tool to backend structure too
tightly. 

Backend structure will keep changing and the last thing we want to hold
it back is some frontend tool which thinks it knows better how to
organize data in backend. What if some completely new storage is added
to postgreSQL (ancient Postgres versions had built-in support for
several storages). Should all frontend tools (including ?DBC drivers)
need updating or just information_schema ?

> I'm just imagining what would happen to pgAdmin3. The column number 
> would have to display attpos (this is what the user is interested in to 
> see the ordering),

No they are interested in position as you mentioned above, they didn't
want to see attnum (i.e 1,2,3,5,6,8 in your example) before either.

I think it is a bug that pgAdmin3 shows attnum instead the real
position.

>  while index, FK and so forth will continue to display 
> attnum. This seems quite unwanted to me.

Actually the column name is shown (at least I could not find attnum
anywhere in keys or indexes)

> ---
> Are there any comments on the proposed lean way to alter columns for 
> trivial type changes?

Sorry, I must have missed it ;(  could you give a link to archived copy.

But I think that nobody objected, but nobody didn't volunteer to do the
work either ;)

At least that was the impression i got from an answer to my similar
question on growing varchars and dropping isnull's without forcing
column copies and constraint checks.

-----------------
Hannu