Thread: logical column position
I'd like to add a new column to pg_attribute that specifies the attribute's "logical position" within its relation. The idea here is to separate the logical order of the columns in a relation from the on-disk storage of the relation's tuples. This allows us to easily & quickly change column order, add an additional column before or after an existing column, etc. At present, attnum basically does three things: identifies an column within a relation, indicates which columns are system columns, and defines the order of a relation's columns. I'd like to move this last functionality into a separate pg_attribute column named "attpos" (or "attlogicalpos"): - when the table is created, attnum == attpos. System columns have attpos < 0, as with attnum. At no pointwill two columns of the same relation have the same attpos. - when returning output to the client and no column ordering is implied by the query (e.g. "SELECT * ..."),we sort the columns in ascending attpos order. - when storing a tuple on disk, we don't consider attpos - if we want to change the order of the column's in a relation, we can do so merely by updating pg_attribute;no changes to the on-disk storage of the relation should be necessary A few notes: (a) ISTM this should also apply to COPY TO and COPY FROM if the user didn't supply a column list. Is this reasonable?It would break dumps of the table's contents, but then again, dumps aren't guaranteed to remain validover arbitrary changes to the table's meta-data. (b) Using the above scheme that attnum == attpos initially, there won't be any gaps in the sequence of attpos values.That means that if, for example, we want to move the column in position 50 to position 1, we'll need to changethe position's of all the columns in positions [1..49] (and suffer the resulting MVCC bloat in pg_attribute).Changing the column order is hardly a performance critical operation, so that might be acceptable. If we want to avoid this, one easy (but arguably unclean) way to do so would be to make the initial value of attpos== attnum * 1000, and make attpos an int4 rather than an int2. Then, we can do most column reordering operationswith only a single pg_attribute update -- in the worst-case that enough re-orderings are done that we overflowthe 999 "padding" positions, we can just fall-back to doing multiple pg_attribute updates. Is this worthdoing, and/or is there a better way to achieve the same effect? (c) Do I need to consider inheritance? Comments are welcome. -Neil
Neil Conway wrote: > I'd like to add a new column to pg_attribute that specifies the > attribute's "logical position" within its relation. The idea here is > to separate the logical order of the columns in a relation from the > on-disk storage of the relation's tuples. This allows us to easily & > quickly change column order, add an additional column before or after > an existing column, etc. > > At present, attnum basically does three things: identifies an column > within a relation, indicates which columns are system columns, and > defines the order of a relation's columns. I'd like to move this last > functionality into a separate pg_attribute column named "attpos" (or > "attlogicalpos"): > > - when the table is created, attnum == attpos. System columns > have attpos < 0, as with attnum. At no point will two > columns of the same relation have the same attpos. > > - when returning output to the client and no column ordering > is implied by the query (e.g. "SELECT * ..."), we sort the > columns in ascending attpos order. Seems the only cases where attpos would be used would be SELECT *, INSERT with no column list, and COPY --- seems like a nifty feature. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001+ If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania19073
Neil Conway <neilc@samurai.com> writes: > At present, attnum basically does three things: identifies an column > within a relation, indicates which columns are system columns, and > defines the order of a relation's columns. I'd like to move this last > functionality into a separate pg_attribute column named "attpos" (or > "attlogicalpos"): "attpos" is a horrid choice of name, because no one will be able to remember which of "attnum" and "attpos" is which. Pick a more distinct name. Offhand the best thing I can think of is "attlognum" or "attlogpos". > - when the table is created, attnum == attpos. System columns > have attpos < 0, as with attnum. At no point will two > columns of the same relation have the same attpos. What are you going to do with deleted columns? I'd be inclined to give them all attlogpos = 0, but that destroys your last comment. > (a) ISTM this should also apply to COPY TO and COPY FROM if the user > didn't supply a column list. Is this reasonable? Yes, also INSERT INTO, also the implicit ordering of output columns of a JOIN, also the matching of aliases to columns in a FROM-list alias, probably one or two other places. SQL exposes column ordering in more places than just "SELECT *". > If we want to avoid this, one easy (but arguably unclean) way to > do so would be to make the initial value of attpos == attnum * > 1000, and make attpos an int4 rather than an int2. Then, we can > do most column reordering operations with only a single > pg_attribute update -- in the worst-case that enough > re-orderings are done that we overflow the 999 "padding" > positions, we can just fall-back to doing multiple pg_attribute > updates. Is this worth doing, and/or is there a better way to > achieve the same effect? That seems horribly messy. Just renumber. > (c) Do I need to consider inheritance? Yes. I think it'd be good if things were constrained so that columns 1..n in a parent table always matched columns 1..n in every child, which is not true now after adding/dropping columns. That would make it easier/cheaper/more reliable to match up which child columns are to be referenced in an inherited query (see adjust_inherited_attrs). I think the effective constraints would have to be about the same as what we now impose on column names in an inheritance hierarchy. You have not presented any proposal for exactly what ALTER TABLE operations would be offered to manipulate the column positions. My recollection is that some consensus was reached on that point in the last thread we had on this issue --- have you consulted the archives? regards, tom lane
On Thu, Nov 20, 2003 at 10:39:24AM -0500, Tom Lane wrote: > > (c) Do I need to consider inheritance? > > Yes. I think it'd be good if things were constrained so that columns > 1..n in a parent table always matched columns 1..n in every child, > which is not true now after adding/dropping columns. That would make it > easier/cheaper/more reliable to match up which child columns are to be > referenced in an inherited query (see adjust_inherited_attrs). No way, because of multiple inheritance. Each child should have an attparentnum, which would point to the parent's attnum for this to work ... -- Alvaro Herrera (<alvherre[@]dcc.uchile.cl>) "Aprender sin pensar es in�til; pensar sin aprender, peligroso" (Confucio)
Alvaro Herrera Munoz <alvherre@dcc.uchile.cl> writes: > On Thu, Nov 20, 2003 at 10:39:24AM -0500, Tom Lane wrote: > (c) Do I need to consider inheritance? >> >> Yes. I think it'd be good if things were constrained so that columns >> 1..n in a parent table always matched columns 1..n in every child, >> which is not true now after adding/dropping columns. > No way, because of multiple inheritance. Each child should have an > attparentnum, which would point to the parent's attnum for this to work ... Hm, good point. And I think we merge identically-named columns inherited from different parents, which would mean that "attparentnum" wouldn't have a unique value anyway. Perhaps rearranging a parent's columns shouldn't have *any* direct effect on a child? Seems ugly though. regards, tom lane
Tom Lane <tgl@sss.pgh.pa.us> writes: > "attpos" is a horrid choice of name, because no one will be able to > remember which of "attnum" and "attpos" is which. Pick a more > distinct name. Offhand the best thing I can think of is "attlognum" > or "attlogpos". Actually, I deliberately chose attpos rather than attlognum (which is what some people had been calling this feature earlier). My reasoning was that the "logical number" is really a nonsensical idea: we just invented it on the spot. In contrast, a "position" is a fairly natural thing for an attribute to have -- it's a notion with some counterpart in the real world. To me, at least, it seems intuitive that an "attnum" would identify a column whereas an "attpos" would specify the column's position. I'm happy to change the name if there's a consensus that attpos isn't a good choice -- what does everyone think? > What are you going to do with deleted columns? I'd be inclined to > give them all attlogpos = 0, but that destroys your last comment. I hadn't planned to do anything in particular for deleted columns: since they are never displayed to the user, does it matter what their attpos is? In any event, the property that no two columns in a table have the same logical number isn't important anyway. > You have not presented any proposal for exactly what ALTER TABLE > operations would be offered to manipulate the column positions. I'd like to get the backend storage side of things implemented first. I'll take a look at the archives before I do any UI work -- thanks for the suggestion. -Neil
Neil Conway <neilc@samurai.com> writes: > Actually, I deliberately chose attpos rather than attlognum (which is > what some people had been calling this feature earlier). My reasoning > was that the "logical number" is really a nonsensical idea: we just > invented it on the spot. True ... > In contrast, a "position" is a fairly natural > thing for an attribute to have -- it's a notion with some counterpart > in the real world. But "position" could at least as logically be considered to mean the physical position in the tuple. I still say that these names are ripe for confusion. I don't have a better choice of name offhand, but if we spend 1% of the time already spent arguing about these issues on finding a better name, I'm sure we can think of one ;-) regards, tom lane
> I don't have a better choice of name offhand, but if we spend 1% of the > time already spent arguing about these issues on finding a better name, > I'm sure we can think of one ;-) virtual (attvirtnum) external (attextnum) atttisoywnum -> attribute this is the one you want number
On Thu, 2003-11-20 at 23:27, Tom Lane wrote: > Neil Conway <neilc@samurai.com> writes: > > Actually, I deliberately chose attpos rather than attlognum (which is > > what some people had been calling this feature earlier). My reasoning > > was that the "logical number" is really a nonsensical idea: we just > > invented it on the spot. > > True ... > > > In contrast, a "position" is a fairly natural > > thing for an attribute to have -- it's a notion with some counterpart > > in the real world. > > But "position" could at least as logically be considered to mean the > physical position in the tuple. I still say that these names are ripe > for confusion. > > I don't have a better choice of name offhand, but if we spend 1% of the > time already spent arguing about these issues on finding a better name, > I'm sure we can think of one ;-) > Seems merging the two would work... attlogpos, the attributes logical position. Robert Treat -- Build A Brighter Lamp :: Linux Apache {middleware} PostgreSQL
Robert Treat <xzilla@users.sourceforge.net> writes: > Seems merging the two would work... attlogpos, the attributes > logical position. Unless anyone has any further objections, I'll switch to using attlogpos. -Neil