Thread: Number of attributes in HeapTupleHeader
Currently there's an int16 t_natts in HeapTupleHeaderData. This number is stored on disk for every single tuple. Assuming that the number of attributes is constant for all tuples of one relation we have a lot of redundancy here. Almost everywhere in the sources, where HeapTupleHeader->t_natts is used, there is a HeapTuple and/or TupleDesc around. In struct tupleDesc there is int natts /* Number of attributes in the tuple */. If we move t_natts from struct HeapTupleHeaderData to struct HeapTupleData, we'd have this number whenever we need it and didn't have to write it to disk millions of times. Two years ago there have been thoughts about ADD COLUMN and whether it should touch all tuples or just change the metadata. Could someone tell me, what eventually came out of this discussion and where I find the relevant pieces of source code, please. What about DROP COLUMN? If there is interest in reducing on-disk tuple header size and I have not missed any strong arguments against dropping t_natts, I'll investigate further. Comments? On Fri, 3 May 2002 01:40:42 +0000 (UTC), tgl@sss.pgh.pa.us (Tom Lane) wrote: > Now if >we could get rid of 8 bytes in the header, I'd get excited ;-) If this is doable, we arrive at 6 bytes. And what works for t_natts, should also work for t_hoff; that's another byte. Are we getting nearer? ServusManfred
On Sun, 05 May 2002 23:48:31 +0200 "Manfred Koizar" <mkoi-pg@aon.at> wrote: > Two years ago there have been thoughts about ADD COLUMN and whether it > should touch all tuples or just change the metadata. Could someone > tell me, what eventually came out of this discussion and where I find > the relevant pieces of source code, please. See AlterTableAddColumn() in commands/tablecmds.c > If there is interest in reducing on-disk tuple header size and I have > not missed any strong arguments against dropping t_natts, I'll > investigate further. Comments? I'd definately be interested -- let me know if you'd like any help... Cheers, Neil -- Neil Conway <neilconway@rogers.com> PGP Key ID: DB3C29FC
On Sun, 5 May 2002 18:07:27 -0400, Neil Conway <nconway@klamath.dyndns.org> wrote: >See AlterTableAddColumn() in commands/tablecmds.c Thanks. Sounds obvious. Should have looked before asking... This doesn't look too promising:* Implementation restrictions: because we don't touch the table rows, ^^^^^^^^^^^^^^^^^^^^^^^^^^* the new column values will initially appear to be NULLs. (This* happensbecause the heap tuple access routines always check for* attnum > # of attributes in tuple, and return NULL if so.) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Scratching my head and pondering on ... I'll be back :-) >I'd definately be interested -- let me know if you'd like any help... Well, currently I'm in the process of making myself familiar with the code. That mainly takes hours of reading and searching. Anyway, thanks; I'll post here, if I have questions. ServusManfred
Manfred Koizar <mkoi-pg@aon.at> writes: > Currently there's an int16 t_natts in HeapTupleHeaderData. This > number is stored on disk for every single tuple. Assuming that the > number of attributes is constant for all tuples of one relation we > have a lot of redundancy here. ... but that's a false assumption. No, I don't think removing 2 bytes from the header is worth making ALTER TABLE ADD COLUMN orders of magnitude slower. Especially since the actual savings will be *zero*, unless you can find another 2 bytes someplace. > If this is doable, we arrive at 6 bytes. And what works for t_natts, > should also work for t_hoff; that's another byte. Are we getting > nearer? Sorry, you used up your chance at claiming that t_hoff is dispensable. If we apply your already-submitted patch, it isn't. The bigger picture here is that the more redundancy we squeeze out of tuple headers, the more fragile the table data structure becomes. Even if we could remove t_natts at zero runtime cost, I'd be concerned about the implications for reliability (ie, ability to detect inconsistencies) and post-crash data reconstruction. I've spent enough time staring at tuple dumps to be fairly glad that we don't run the data through a compressor ;-) regards, tom lane
> -----Original Message----- > From: Manfred Koizar > > If there is interest in reducing on-disk tuple header size and I have > not missed any strong arguments against dropping t_natts, I'll > investigate further. Comments? If a dbms is proper, it prepares a mechanism from the first to handle ADD COLUMN without touching the tuples. If the machanism is lost(I believe so) by removing t_natts, I would say good bye to PostgreSQL. regards, Hiroshi Inoue
On Mon, 6 May 2002 08:44:27 +0900 "Hiroshi Inoue" <Inoue@tpf.co.jp> wrote: > > -----Original Message----- > > From: Manfred Koizar > > > > If there is interest in reducing on-disk tuple header size and I have > > not missed any strong arguments against dropping t_natts, I'll > > investigate further. Comments? > > If a dbms is proper, it prepares a mechanism from the first > to handle ADD COLUMN without touching the tuples. If the > machanism is lost(I believe so) by removing t_natts, I would > say good bye to PostgreSQL. IMHO, the current ADD COLUMN mechanism is a hack. Besides requiring redundant on-disk data (t_natts), it isn't SQL compliant (because default values or NOT NULL can't be specified), and depends on a low-level kludge (that the storage system will return NULL for any attnums > the # of the attributes stored in the tuple). While instantaneous ADD COLUMN is nice, I think it's counter- productive to not take advantage of a storage space optimization just to preserve a feature that is already semi-broken. Cheers, Neil -- Neil Conway <neilconway@rogers.com> PGP Key ID: DB3C29FC
Neil Conway <nconway@klamath.dyndns.org> writes: > IMHO, the current ADD COLUMN mechanism is a hack. Besides requiring > redundant on-disk data (t_natts), it isn't SQL compliant (because > default values or NOT NULL can't be specified), and depends on > a low-level kludge (that the storage system will return NULL for > any attnums > the # of the attributes stored in the tuple). It could be improved if anyone felt like working on it. Hint: instead of returning NULL for col > t_natts, you could instead return whatever default value is specified for the column... at least for the case of a constant default, which is the main thing people are interested in IMHO. regards, tom lane
> IMHO, the current ADD COLUMN mechanism is a hack. Besides requiring > redundant on-disk data (t_natts), it isn't SQL compliant (because > default values or NOT NULL can't be specified), and depends on > a low-level kludge (that the storage system will return NULL for > any attnums > the # of the attributes stored in the tuple). > > While instantaneous ADD COLUMN is nice, I think it's counter- > productive to not take advantage of a storage space optimization > just to preserve a feature that is already semi-broken. I actually started working on modifying ADD COLUMN to allow NOT NULL and DEFAULT clauses. Tom's idea of having col > n_atts return the default instead of NULL is cool - I didn't think of that. My changes would have basically made the plain add column we have at the moment work instantly, but if they specified NOT NULL it would touch every row. That way it's up to the DBA which one they want (as good HCI should always do). However, now that my SET/DROP NOT NULL patch is in there, it's easy to do the whole add column process, just in a transaction: BEGIN; ALTER TABLE foo ADD bar int4; UPDATE foo SET bar=3; ALTER TABLE foo ALTER bar SET NOT NULL; ALTER TABLE foo SET DEFAULT 3; ALTER TABLE foo ADD FOREIGN KEY (bar) REFERENCES (noik); COMMIT; With the advantage that you have full control over every step... Chris
I said: > Sorry, you used up your chance at claiming that t_hoff is dispensable. > If we apply your already-submitted patch, it isn't. Wait, I take that back. t_hoff is important to distinguish how much bitmap padding there is on a particular tuple --- but that's really only interesting as long as we aren't forcing dump/initdb/reload. If we are changing anything else about tuple headers, then that argument becomes irrelevant anyway. However, I'm still concerned about losing safety margin by removing "redundant" fields. regards, tom lane
Neil Conway wrote: > > On Mon, 6 May 2002 08:44:27 +0900 > "Hiroshi Inoue" <Inoue@tpf.co.jp> wrote: > > > -----Original Message----- > > > From: Manfred Koizar > > > > > > If there is interest in reducing on-disk tuple header size and I have > > > not missed any strong arguments against dropping t_natts, I'll > > > investigate further. Comments? > > > > If a dbms is proper, it prepares a mechanism from the first > > to handle ADD COLUMN without touching the tuples. If the > > machanism is lost(I believe so) by removing t_natts, I would > > say good bye to PostgreSQL. > > IMHO, the current ADD COLUMN mechanism is a hack. Besides requiring > redundant on-disk data (t_natts), it isn't SQL compliant (because > default values or NOT NULL can't be specified), and depends on > a low-level kludge (that the storage system will return NULL for > any attnums > the # of the attributes stored in the tuple). I think it's neither a hack nor a kludge. The value of data which are non-existent at the appearance is basically unknown. So there could be an implementation of ALTER TABLE ADD COLUMN .. DEFAULT which doesn't touch existent tuples at all as Oracle does. Though I don't object to touch tuples to implement ADD COLUMN .. DEFAULT, please don't change the existent stuff together. regards, Hiroshi Inouehttp://w2422.nsk.ne.jp/~inoue/
I think the real trick is keeping track of the difference between: begin; ALTER TABLE tab ADD COLUMN col1 int4 DEFAULT 4; commit; and begin; ALTER TABLE tab ADD COLUMN col1; ALTER TABLE tab ALTER COLUMN col1 SET DEFAULT 4; commit; The first should populate the column with the value of '4', the second should populate the column with NULL and have new entries with default of 4. Not to mention begin; ALTER TABLE tab ADD COLUMN col1 DEFAULT 5; ALTER TABLE tab ALTER COLUMN col1 SET DEFAULT 4; commit; New tuples with default value of 4, but the column creation should have 5. -- Rod ----- Original Message ----- From: "Hiroshi Inoue" <Inoue@tpf.co.jp> To: "Neil Conway" <nconway@klamath.dyndns.org> Cc: <mkoi-pg@aon.at>; <pgsql-hackers@postgresql.org> Sent: Monday, May 06, 2002 9:08 PM Subject: Re: [HACKERS] Number of attributes in HeapTupleHeader > Neil Conway wrote: > > > > On Mon, 6 May 2002 08:44:27 +0900 > > "Hiroshi Inoue" <Inoue@tpf.co.jp> wrote: > > > > -----Original Message----- > > > > From: Manfred Koizar > > > > > > > > If there is interest in reducing on-disk tuple header size and I have > > > > not missed any strong arguments against dropping t_natts, I'll > > > > investigate further. Comments? > > > > > > If a dbms is proper, it prepares a mechanism from the first > > > to handle ADD COLUMN without touching the tuples. If the > > > machanism is lost(I believe so) by removing t_natts, I would > > > say good bye to PostgreSQL. > > > > IMHO, the current ADD COLUMN mechanism is a hack. Besides requiring > > redundant on-disk data (t_natts), it isn't SQL compliant (because > > default values or NOT NULL can't be specified), and depends on > > a low-level kludge (that the storage system will return NULL for > > any attnums > the # of the attributes stored in the tuple). > > I think it's neither a hack nor a kludge. > The value of data which are non-existent at the appearance > is basically unknown. So there could be an implementation > of ALTER TABLE ADD COLUMN .. DEFAULT which doesn't touch > existent tuples at all as Oracle does. > Though I don't object to touch tuples to implement ADD COLUMN > .. DEFAULT, please don't change the existent stuff together. > > regards, > Hiroshi Inoue > http://w2422.nsk.ne.jp/~inoue/ > > ---------------------------(end of broadcast)--------------------------- > TIP 6: Have you searched our list archives? > > http://archives.postgresql.org >
Rod Taylor wrote: > > I think the real trick is keeping track of the difference between: > > begin; > ALTER TABLE tab ADD COLUMN col1 int4 DEFAULT 4; > commit; > > and > > begin; > ALTER TABLE tab ADD COLUMN col1; > ALTER TABLE tab ALTER COLUMN col1 SET DEFAULT 4; > commit; > > The first should populate the column with the value of '4', the second > should populate the column with NULL and have new entries with default > of 4. I know the difference. Though I don't love the standard spec of the first, I don't object to introduce it. My only anxiety is that the implementation of the first would replace the current implementaion of ADD COLUMN (without default) together to touch tuples. regards, Hiroshi Inouehttp://w2422.nsk.ne.jp/~inoue/
On Sun, 05 May 2002 19:41:00 -0400, Tom Lane <tgl@sss.pgh.pa.us> wrote: >No, I don't think removing 2 bytes from the header is worth making >ALTER TABLE ADD COLUMN orders of magnitude slower. I agree. And I'll not touch the code, if my modifications break an existing feature. For now I rather work on a patch to eliminate one of the 4 Transaction/CommandIds per tuple as discussed in another thread. This will at least benefit those, who run PG on machines with 4 byte alignment. >The bigger picture here is that the more redundancy we squeeze out >of tuple headers, the more fragile the table data structure becomes. >Even if we could remove t_natts at zero runtime cost, I'd be concerned >about the implications for reliability (ie, ability to detect >inconsistencies) and post-crash data reconstruction. I've spent enough >time staring at tuple dumps to be fairly glad that we don't run the >data through a compressor ;-) Well, that's a matter of taste. You are around for several years and you are used to having natts in each tuple. Others might wish to have more redundant metadata in tuple headers, or less. It's hard to draw a sharp line here. ServusManfred
On Mon, 6 May 2002 21:52:30 -0400, "Rod Taylor" <rbt@zort.ca> wrote: >I think the real trick is keeping track of the difference between: > >begin; >ALTER TABLE tab ADD COLUMN col1 int4 DEFAULT 4; >commit; > >begin; >ALTER TABLE tab ADD COLUMN col1; >ALTER TABLE tab ALTER COLUMN col1 SET DEFAULT 4; >commit; >[...] >begin; >ALTER TABLE tab ADD COLUMN col1 DEFAULT 5; >ALTER TABLE tab ALTER COLUMN col1 SET DEFAULT 4; >commit; This starts to get interesting. Wouldn't it be cool, if PG could do all these ALTER TABLE statements without touching any existing tuple? This is possible; it needs a feature we could call MVMD (multi version metadata). How could that work? I think of something like: An ALTER TABLE statement makes a new copy of the metadata describing the table, modifies the copy and gives it a unique (for this table) version number. It does not change or remove old metadata. Every tuple knows the current metadata version as of the tuple's creation. Whenever a tuple is read, the correct version of the tuple descriptor is associated to it. All conversions to make the old tuple format look like the current one are done on the fly. When a tuple is updated, this clearly is handled like an insert, so the tuple is converted to the most recent format. The version number could be a small (1 byte) integer. If we maintain min and max valid version in the table metadata, we could even allow the version to roll over to 0 after the highest possible value. Max version would be incremented by ALTER TABLE, min version could be advanced by VACUUM. The key point to make this work is whether we can keep the runtime cost low. I think there should be no problem regarding memory footprint (just a few more tuple descriptors), but cannot (yet) estimate the cpu overhead. With MVMD nobody could call handling of pre ALTER TABLE tuples a hack or a kludge. There would be a well defined concept. No, this concept is neither new nor is it mine. I just like the idea, and I hope I have described it correctly. And no, I'm not whining that I think I need a feature and want you to implement it for me. I've got myself a shovel and a hoe and I'm ready to dig, as soon as the hackers agree, where it makes sense. Oh, just one wish: please try to find friendly words, if you have to tell me, that this is all bullshit :-) ServusManfred
Manfred Koizar <mkoi-pg@aon.at> writes: > An ALTER TABLE statement makes a new copy of the metadata describing > the table, modifies the copy and gives it a unique (for this table) > version number. It does not change or remove old metadata. This has been discussed before --- in PG terms, it'd mean keeping the OID of a rowtype in the tuple header. (No, I won't let you get away with a 1-byte integer. But you could remove natts and hoff, thus buying back 3 of the 4 bytes.) I was actually going to suggest it again earlier in this thread; but people weren't excited about the idea last time it was brought up, so I decided not to bother. It'd be a *lot* of work and a lot of breakage of existing clients (eg, pg_attribute would need to link to pg_type not pg_class, pg_class.relnatts would move to pg_type, etc etc). The flexibility looks cool, but people seem to feel that the price is too high for the actual amount of usefulness. regards, tom lane
-- Rod ----- Original Message ----- From: "Tom Lane" <tgl@sss.pgh.pa.us> To: "Manfred Koizar" <mkoi-pg@aon.at> Cc: "Rod Taylor" <rbt@zort.ca>; "Hiroshi Inoue" <Inoue@tpf.co.jp>; "Neil Conway" <nconway@klamath.dyndns.org>; <pgsql-hackers@postgresql.org> Sent: Wednesday, May 08, 2002 4:54 PM Subject: Re: [HACKERS] Number of attributes in HeapTupleHeader > This has been discussed before --- in PG terms, it'd mean keeping the > OID of a rowtype in the tuple header. (No, I won't let you get away > with a 1-byte integer. But you could remove natts and hoff, thus > buying back 3 of the 4 bytes.) Could the OID be on a per page basis? Rather than versioning each tuple, much with a page at a time? Means when you update one in a page the rest need to be tested to ensure that they have the most recent type, but it certainly makes storage requirements smaller when Toast isn't involved (8k rows). > I was actually going to suggest it again earlier in this thread; but > people weren't excited about the idea last time it was brought up, > so I decided not to bother. It'd be a *lot* of work and a lot of > breakage of existing clients (eg, pg_attribute would need to link > to pg_type not pg_class, pg_class.relnatts would move to pg_type, > etc etc). The flexibility looks cool, but people seem to feel that > the price is too high for the actual amount of usefulness. There would be no cost if we had an information schema of somekind. Just change how the views are made. Getting everything to use the information schema in the first place is tricky though...
On Wed, 8 May 2002 17:33:08 -0400, "Rod Taylor" <rbt@zort.ca> wrote: >From: "Tom Lane" <tgl@sss.pgh.pa.us> >> This has been discussed before --- in PG terms, it'd mean keeping >the >> OID of a rowtype in the tuple header. (No, I won't let you get away >> with a 1-byte integer. But you could remove natts and hoff, thus >> buying back 3 of the 4 bytes.) > >Could the OID be on a per page basis? Rather than versioning each >tuple, much with a page at a time? Means when you update one in a >page the rest need to be tested to ensure that they have the most >recent type, [...] Rod, "to be tested" is not enough, they'd have to be converted, which means they could grow, thus possibly using up the free space on the page. Or did you mean to treat this just like a normal update? I was rather thinking of some kind of a translation vector: having 1 array of rowtype OIDs per relation and 1 byte per tuple pointing into this array. But that has been rejected. So it seems we are getting off topic. Initially this thread was about reducing tuple header size, and now we've arrived at increasing the size by one byte :-) ServusManfred
Tom Lane wrote: > I said: > > Sorry, you used up your chance at claiming that t_hoff is dispensable. > > If we apply your already-submitted patch, it isn't. > > Wait, I take that back. t_hoff is important to distinguish how much > bitmap padding there is on a particular tuple --- but that's really > only interesting as long as we aren't forcing dump/initdb/reload. > If we are changing anything else about tuple headers, then that > argument becomes irrelevant anyway. > > However, I'm still concerned about losing safety margin by removing > "redundant" fields. I just wanted to comment that redundancy in the tuple header, while adding a very marginal amount to stability, is really too high a cost. If we can save 4 bytes on every row stored, I think that is a clear win. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 853-3000+ If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania19026