Thread: Tuple data
Hi. I've still got something I can't seem to get. In my test cases with simple tables the first uint16 of tuple data after the header contained the length of the tuple. In this case I can't seem to figure out what the value F24D stands for when I'd expect it's length to be 0800. The first tuple in my table has: OID: 6155665 t_cmin: 32494973 t_cmax: 0 t_xmin: 32494324 t_xmax: 32495742 t_ctid: 55181312:82 t_infomask: A503 Bitmap: 3F00000000F2 Attributes: 7 Data Offset: 36 The flags for this tuple say: HEAP_MOVED_IN HEAP_UPDATED HEAP_XMAX_COMMITTED HEAP_XMIN_INVALID HEAP_HASVARLENA HEAP_HASNULL Tuple Data: F24D 0000 FFFF FFFF 1300 0000 4E65 7720 4D61 696C 2046 6F6C 6465 7200 9F00 0000 9F00 0000 48A2 1800 The schema is: Attribute | Type | Modifier -------------+-------------+------------------------------------------------ --------userid | integer | not nullfolderid | integer | not null default nextval('folders_folderid_seq'::text)foldername | varchar(25) |messages | integer |newmessages | integer |foldersize | integer |popinfo | integer | Indices: folder_folderid_idx, folders_pkey thanks -Michael
"Michael Richards" <miker@interchange.ca> writes: > I've still got something I can't seem to get. In my test cases with simple > tables the first uint16 of tuple data after the header contained the length > of the tuple. That's not right --- AFAIR there is no length in the tuple data. You must use the length from the 'page item' pointer that points to this tuple if you want to know the total tuple length. If you were testing with tables containing single varlena columns, then you may have seen the varlena datum's length word and taken it for total length of the tuple --- but it's only total length of that one column. Your example dump looks like F24D 0000 is userid, FFFF FFFF is folderid, and 1300 0000 is the varlena length word for foldername. regards, tom lane
Michael Richards wrote: > > Hi. > > I've still got something I can't seem to get. In my test cases with simple > tables the first uint16 of tuple data after the header contained the length > of the tuple. In this case I can't seem to figure out what the value F24D > stands for when I'd expect it's length to be 0800. I'm not sure, but you may see some part of the NULL bitmap. IIRC it started at a quite illogical place, is suspect it was at byte 31 but it still reserved 4bytes for each 32 fields after byte 32 > The first tuple in my table has: ... > Bitmap: 3F00 0000 00F2 > Attributes: 7 you should have only 4 bytes of bitmap for 7 real attributes > Data Offset: 36 thats' right 32+4 ---------- Hannu
> That's not right --- AFAIR there is no length in the tuple data. You > must use the length from the 'page item' pointer that points to this > tuple if you want to know the total tuple length. Oops, I meant attribute length... > If you were testing with tables containing single varlena columns, then > you may have seen the varlena datum's length word and taken it for total > length of the tuple --- but it's only total length of that one column. Yes, I obviously had assumed that this length was common to all types (I was testing with varchars before). I presume then that I get the sizes based on some system tables. What query should I run to give me the layout (in the order it's on disk) and the size of each non-varlen attribute? > Your example dump looks like F24D 0000 is userid, FFFF FFFF is folderid, > and 1300 0000 is the varlena length word for foldername. This is correct. thanks -Michael
Michael Richards wrote: > > > That's not right --- AFAIR there is no length in the tuple data. You > > must use the length from the 'page item' pointer that points to this > > tuple if you want to know the total tuple length. > > Oops, I meant attribute length... > > > If you were testing with tables containing single varlena columns, then > > you may have seen the varlena datum's length word and taken it for total > > length of the tuple --- but it's only total length of that one column. > > Yes, I obviously had assumed that this length was common to all types (I was > testing with varchars before). > > I presume then that I get the sizes based on some system tables. What query > should I run to give me the layout (in the order it's on disk) and the size > of each non-varlen attribute? select * from pg_attributewhere attrelid = (select oid from pg_class where relname = 'tablename')order by attnum; then look up types by attypid to find the types or just look at attlen ==-1 for varlena types select * from pg_type where oid = 23; -- gives info for int type select * from pg_type where oid = 1043; -- varchar -------- Hannu
> I'm not sure, but you may see some part of the NULL bitmap. > IIRC it started at a quite illogical place, is suspect it was at byte 31 > but > it still reserved 4bytes for each 32 fields after byte 32 Sometimes the t_hoff value in the tuple header is 32 which seems to indicate no NULL bitmap. This really makes me wonder what happens when you ALTER TABLE ADD COLUMN on a table since it doesn't seem to take more than O(1) time. Perhaps it is assumed if the attribute count is less than the actual number of attributes then the last ones are NULL and no NULL map is required. > > The first tuple in my table has: > ... > > Bitmap: 3F00 0000 00F2 > > Attributes: 7 > > you should have only 4 bytes of bitmap for 7 real attributes Yes you are correct, my error. To find the bitmap length I was doing: for (int i=0;i<header->t_hoff-30;i++) Where if I were able to count it should have been: for (int i=0;i<header->t_hoff-32;i++) -Michael
pg_attribute tells you the types and ordering of the attributes (columns) of a table. Then see pg_type for the size and alignment of each type. regards, tom lane
"Michael Richards" <miker@interchange.ca> writes: > Sometimes the t_hoff value in the tuple header is 32 which seems to indicate > no NULL bitmap. There's no null bitmap unless the HASNULLS infomask bit is set. > This really makes me wonder what happens when you ALTER > TABLE ADD COLUMN on a table since it doesn't seem to take more than O(1) > time. Perhaps it is assumed if the attribute count is less than the actual > number of attributes then the last ones are NULL and no NULL map is > required. ALTER ADD COLUMN doesn't touch any tuples, and you're right that it's critically dependent on heap_getattr returning NULL when an attribute beyond the number of attributes actually present in a tuple is accessed. That's a fragile and unclean implementation IMHO --- see past traffic on this list. regards, tom lane
Tom Lane wrote: > >> ALTER ADD COLUMN doesn't touch any tuples, and you're right that it's > critically dependent on heap_getattr returning NULL when an attribute > beyond the number of attributes actually present in a tuple is accessed. > That's a fragile and unclean implementation IMHO --- see past traffic > on this list. Short of redesigning the whole storage format I can see no better way to allow ALTER ADD COLUMN in any reasonable time. And I cna see no place where this is more "fragile and unclean implementation" than any other in postgres -- OTOH it is quite hard for me to "see the past traffic on this list" as my "PgSQL HACKERS" mail folder is too big for anything else then grep ;) The notion that anything not stored is NULL seems so natural to me that it is very hard to find any substantial flaw or fragility with it. -------------- Hannu
Hannu Krosing wrote: > > Tom Lane wrote: > > > >> ALTER ADD COLUMN doesn't touch any tuples, and you're right that it's > > critically dependent on heap_getattr returning NULL when an attribute > > beyond the number of attributes actually present in a tuple is accessed. > > That's a fragile and unclean implementation IMHO --- see past traffic > > on this list. > > Short of redesigning the whole storage format I can see no better way to > allow > ALTER ADD COLUMN in any reasonable time. And I cna see no place where > this is > more "fragile and unclean implementation" than any other in postgres -- > OTOH it is quite hard for me to "see the past traffic on this list" as > my > "PgSQL HACKERS" mail folder is too big for anything else then grep ;) > I don't remember the traffic either. IIRC,I objected to Tom at this point in pgsql-bugs recently. I think it's very important for dbms that ALTER ADD COLUMN touches tuples as less as possible. Regards. Hiroshi Inoue
Hiroshi Inoue <Inoue@tpf.co.jp> writes: >> Tom Lane wrote: >>>>> ALTER ADD COLUMN doesn't touch any tuples, and you're right that it's >>>> critically dependent on heap_getattr returning NULL when an attribute >>>> beyond the number of attributes actually present in a tuple is accessed. >>>> That's a fragile and unclean implementation IMHO --- see past traffic >>>> on this list. > I don't remember the traffic either. > IIRC,I objected to Tom at this point in pgsql-bugs recently. That was the traffic I was recalling ;-) > I think it's very important for dbms that ALTER ADD COLUMN > touches tuples as less as possible. I disagree. The existing ADD COLUMN implementation only works for appending columns at the end of tuples; it can't handle inserting a column. To make it usable for inherited tables requires truly horrendous kluges (as you well know). IMHO we'd be far better off to rewrite ADD COLUMN so that it does go through and change all the tuples, and then we could get rid of the hackery that tries --- not very successfully --- to deal with inconsistent column orders between parent and child tables. I have a similar opinion about DROP COLUMN ... regards, tom lane
Considering how often you actually change the structure of a database, I don't mind waiting for such a reorganisation to take place, however it would still be nice if it could be done in O(1) time because it would minimise the amount of downtime required for structure changes. What are the cases where the current implementation does not handle it properly? Restructuring all the tables (inherited too) would require either 2x the space or lots of hackery to take care of situations where there isn't enough room for a larger null bitmap. This hackery seems more complicated than just having alter look for inherited tables and add the column to those as well. You could define a flag or something so a deleted column could be so flagged and ALTER TABLE DELETE COLUMN would run just as fast. Vacuum could then take care of cleaning out these columns. If you wanted to make it really exciting, how about searching for a deleted column for the ADD column. Touch all the tuples by zeroing that column and finally update pg_attribute. Nothing would be more fun than 2 way fragmentation :) -Michael ----- Original Message ----- From: "Tom Lane" <tgl@sss.pgh.pa.us> To: "Hiroshi Inoue" <Inoue@tpf.co.jp> Cc: "Hannu Krosing" <hannu@tm.ee>; "Michael Richards" <miker@interchange.ca>; <pgsql-hackers@postgresql.org> Sent: Sunday, December 17, 2000 8:05 PM Subject: Re: [HACKERS] Tuple data > Hiroshi Inoue <Inoue@tpf.co.jp> writes: > >> Tom Lane wrote: > >>>>> ALTER ADD COLUMN doesn't touch any tuples, and you're right that it's > >>>> critically dependent on heap_getattr returning NULL when an attribute > >>>> beyond the number of attributes actually present in a tuple is accessed. > >>>> That's a fragile and unclean implementation IMHO --- see past traffic > >>>> on this list. > > > I don't remember the traffic either. > > IIRC,I objected to Tom at this point in pgsql-bugs recently. > > That was the traffic I was recalling ;-) > > > I think it's very important for dbms that ALTER ADD COLUMN > > touches tuples as less as possible. > > I disagree. The existing ADD COLUMN implementation only works for > appending columns at the end of tuples; it can't handle inserting > a column. To make it usable for inherited tables requires truly > horrendous kluges (as you well know). IMHO we'd be far better off > to rewrite ADD COLUMN so that it does go through and change all the > tuples, and then we could get rid of the hackery that tries --- not > very successfully --- to deal with inconsistent column orders between > parent and child tables. > > I have a similar opinion about DROP COLUMN ... > > regards, tom lane
"Michael Richards" <miker@interchange.ca> writes: > What are the cases where the current implementation does not handle it > properly? Inheritance. CREATE TABLE parent (a, b, c); CREATE TABLE child (z) INHERITS (parent); ALTER TABLE parent ADD COLUMN (d); With the current implementation you now have column order a,b,c,d in the parent, and a,b,c,z,d in the child. This is seriously broken for a number of reasons, not least being that pg_dump can't realistically be expected to reproduce that state. I don't really buy the complaint about "it'll take 2x the space". So what? You'll likely expend that anyway trying to load reasonable data into the new column. If we implemented ADD COLUMN in a less klugy fashion, we could at least support loading a DEFAULT value into the column (not to mention allowing it to be NOT NULL). More to the point, I don't think that using 2x space is a sufficient justification for the complexity and fragility that are imposed *throughout* the system in order to make ADD COLUMN's life easy. You pay those hidden costs every day you use Postgres, even if you've never done an ADD COLUMN in your life. > You could define a flag or something so a deleted column could be so flagged > and ALTER TABLE DELETE COLUMN would run just as fast. Hiroshi already tried that; you can find the vestiges of his attempt in current sources (look for _DROP_COLUMN_HACK__). Again, the cost to the rest of the system strikes me as far more than I care to pay. In the end it's a judgment call --- my judgment is that making these features fast is not worth the implementation effort and understandability/reliability penalties that ensue. I think we would be better off spending our effort on other things. regards, tom lane
Tom Lane wrote: > > Hiroshi Inoue <Inoue@tpf.co.jp> writes: > >> Tom Lane wrote: > >>>>> ALTER ADD COLUMN doesn't touch any tuples, and you're right that it's > >>>> critically dependent on heap_getattr returning NULL when an attribute > >>>> beyond the number of attributes actually present in a tuple is accessed. > >>>> That's a fragile and unclean implementation IMHO --- see past traffic > >>>> on this list. > > > I don't remember the traffic either. > > IIRC,I objected to Tom at this point in pgsql-bugs recently. > > That was the traffic I was recalling ;-) > > > I think it's very important for dbms that ALTER ADD COLUMN > > touches tuples as less as possible. > > I disagree. The existing ADD COLUMN implementation only works for > appending columns at the end of tuples; it can't handle inserting > a column. Column order isn't essential in rdbms. Isn't it well known that it's not preferable to use 'select *','insert' without column list etc.. in production applications ? > To make it usable for inherited tables requires truly > horrendous kluges (as you well know). Logical/physical attribute numbers solves it naturally. > IMHO we'd be far better off > to rewrite ADD COLUMN so that it does go through and change all the > tuples, and then we could get rid of the hackery that tries --- not > very successfully --- to deal with inconsistent column orders between > parent and child tables. > We couldn't live without ALTER ADD COLUMN and it's very critical for me to be able to ADD COLUMN even when the target table is at full work. It has been one of my criteria how cool the dbms is. Fortunately PostgreSQL has been cool but .... Regards. Hiroshi Inoue
This is what I assumed the problem to be but I wasn't sure if there would be more to it or not. My question now is: Should the order in which columns are physically stored matter? Since the details of where to find the columns in the tuple data are stored in pg_attribute, I'd think this is a place where the storage layer should be free to store it as it likes. Consider as a performance enhancement shuffling all the variable length columns to the end of the table. This would save having to look at the size of all the variable length columns in order to examine a fixed length column. Obviously since I only have a brief understanding of how stuff works I'm relying on you to point out whether this is even a valid suggestion. -Michael > Inheritance. > > CREATE TABLE parent (a, b, c); > > CREATE TABLE child (z) INHERITS (parent); > > ALTER TABLE parent ADD COLUMN (d); > > With the current implementation you now have column order a,b,c,d in the > parent, and a,b,c,z,d in the child. This is seriously broken for a > number of reasons, not least being that pg_dump can't realistically be > expected to reproduce that state.
On Sunday 17 December 2000 19:10, Hiroshi Inoue wrote: > Hannu Krosing wrote: > > Tom Lane wrote: > > >> ALTER ADD COLUMN doesn't touch any tuples, and you're right that it's > > > > > > critically dependent on heap_getattr returning NULL when an attribute > > > beyond the number of attributes actually present in a tuple is > > > accessed. That's a fragile and unclean implementation IMHO --- see past > > > traffic on this list. > > > > Short of redesigning the whole storage format I can see no better way to > > allow > > ALTER ADD COLUMN in any reasonable time. And I cna see no place where > > this is > > more "fragile and unclean implementation" than any other in postgres -- > > OTOH it is quite hard for me to "see the past traffic on this list" as > > my > > "PgSQL HACKERS" mail folder is too big for anything else then grep ;) > > I don't remember the traffic either. This is kind of a lame comment, but the pgsql- mail lists are archived at www.mail-archive.com. You can search lots of archived mail lists there. > IIRC,I objected to Tom at this point in pgsql-bugs recently. > I think it's very important for dbms that ALTER ADD COLUMN > touches tuples as less as possible. > > Regards. > Hiroshi Inoue -- -------- Robert B. Easter reaster@comptechnews.com --------- - CompTechNews Message Board http://www.comptechnews.com/ - - CompTechServ Tech Services http://www.comptechserv.com/ - ---------- http://www.comptechnews.com/~reaster/ ------------
Hiroshi Inoue <Inoue@tpf.co.jp> writes: >> To make it usable for inherited tables requires truly >> horrendous kluges (as you well know). > Logical/physical attribute numbers solves it naturally. Maybe. At this point that's a theory without experimental evidence to back it up ;-). I'm still concerned about how widespread/intrusive the changes will need to be. regards, tom lane
Hiroshi Inoue wrote : [ ... ] > Column order isn't essential in rdbms. <Nitpicking> A relation (a table) is a subset of the Cartesain cross-product of the definition domains of the attributes (columns). Cartesian product being a commutative operation, "order of columns" does not really exists. Period. If you impose an order relationship, you *add* inforation to the structure. That may be OK, but you can't rely on relational algebra to guarantee your results. You'll have to manage it yourself. (And, yes, there is relevant algebra for this, too ...). </Nitpicking> > Isn't it well known that it's not preferable to use > 'select *','insert' without column list etc.. in production > applications ? 100% agreed. Such a notation is an abbreviation. Handy, but dangerous. IMHO, such checking can (should ?) be done by an algorithm checking for column *names* before sending the "insert" command. A partial workaround : inserting in a view containing only the relevant columns, in a suitable (and known) order. [ Back to lurking ... ]