Thread: Tuple data

Tuple data

From
"Michael Richards"
Date:
Hi.

I've still got something I can't seem to get. In my test cases with simple
tables the first uint16 of tuple data after the header contained the length
of the tuple. In this case I can't seem to figure out what the value F24D
stands for when I'd expect it's length to be 0800.

The first tuple in my table has:
OID: 6155665
t_cmin: 32494973
t_cmax: 0
t_xmin: 32494324
t_xmax: 32495742
t_ctid: 55181312:82
t_infomask: A503
Bitmap: 3F00000000F2
Attributes: 7
Data Offset: 36

The flags for this tuple say:
HEAP_MOVED_IN
HEAP_UPDATED
HEAP_XMAX_COMMITTED
HEAP_XMIN_INVALID
HEAP_HASVARLENA
HEAP_HASNULL


Tuple Data:
F24D 0000 FFFF FFFF 1300 0000 4E65 7720 4D61 696C 2046 6F6C 6465 7200
9F00 0000 9F00 0000 48A2 1800

The schema is: Attribute  |    Type     |                        Modifier
-------------+-------------+------------------------------------------------
--------userid      | integer     | not nullfolderid    | integer     | not null default
nextval('folders_folderid_seq'::text)foldername  | varchar(25) |messages    | integer     |newmessages | integer
|foldersize | integer     |popinfo     | integer     |
 
Indices: folder_folderid_idx,        folders_pkey

thanks

-Michael



Re: Tuple data

From
Tom Lane
Date:
"Michael Richards" <miker@interchange.ca> writes:
> I've still got something I can't seem to get. In my test cases with simple
> tables the first uint16 of tuple data after the header contained the length
> of the tuple.

That's not right --- AFAIR there is no length in the tuple data.  You
must use the length from the 'page item' pointer that points to this
tuple if you want to know the total tuple length.

If you were testing with tables containing single varlena columns, then
you may have seen the varlena datum's length word and taken it for total
length of the tuple --- but it's only total length of that one column.

Your example dump looks like F24D 0000 is userid, FFFF FFFF is folderid,
and 1300 0000 is the varlena length word for foldername.
        regards, tom lane


Re: Tuple data

From
Hannu Krosing
Date:
Michael Richards wrote:
> 
> Hi.
> 
> I've still got something I can't seem to get. In my test cases with simple
> tables the first uint16 of tuple data after the header contained the length
> of the tuple. In this case I can't seem to figure out what the value F24D
> stands for when I'd expect it's length to be 0800.

I'm not sure, but you may see some part of the NULL bitmap. 
IIRC it started at a quite illogical place, is suspect it was at byte 31
but 
it still reserved 4bytes for each 32 fields after byte 32

> The first tuple in my table has:
...
> Bitmap: 3F00 0000 00F2
> Attributes: 7

you should have only 4 bytes of bitmap for 7 real attributes

> Data Offset: 36

thats' right  32+4

----------
Hannu


Re: Tuple data

From
"Michael Richards"
Date:
> That's not right --- AFAIR there is no length in the tuple data.  You
> must use the length from the 'page item' pointer that points to this
> tuple if you want to know the total tuple length.

Oops, I meant attribute length...

> If you were testing with tables containing single varlena columns, then
> you may have seen the varlena datum's length word and taken it for total
> length of the tuple --- but it's only total length of that one column.

Yes, I obviously had assumed that this length was common to all types (I was
testing with varchars before).

I presume then that I get the sizes based on some system tables. What query
should I run to give me the layout (in the order it's on disk) and the size
of each non-varlen attribute?

> Your example dump looks like F24D 0000 is userid, FFFF FFFF is folderid,
> and 1300 0000 is the varlena length word for foldername.

This is correct.

thanks
-Michael



Re: Tuple data

From
Hannu Krosing
Date:
Michael Richards wrote:
> 
> > That's not right --- AFAIR there is no length in the tuple data.  You
> > must use the length from the 'page item' pointer that points to this
> > tuple if you want to know the total tuple length.
> 
> Oops, I meant attribute length...
> 
> > If you were testing with tables containing single varlena columns, then
> > you may have seen the varlena datum's length word and taken it for total
> > length of the tuple --- but it's only total length of that one column.
> 
> Yes, I obviously had assumed that this length was common to all types (I was
> testing with varchars before).
> 
> I presume then that I get the sizes based on some system tables. What query
> should I run to give me the layout (in the order it's on disk) and the size
> of each non-varlen attribute?

select * from pg_attributewhere attrelid = (select oid from pg_class where relname = 'tablename')order by attnum;

then look up types by attypid to find the types or just look at attlen
==-1 for varlena types

select * from pg_type where oid = 23;     -- gives info for int type
select * from pg_type where oid = 1043;   -- varchar



--------
Hannu


Re: Tuple data

From
"Michael Richards"
Date:
> I'm not sure, but you may see some part of the NULL bitmap.
> IIRC it started at a quite illogical place, is suspect it was at byte 31
> but
> it still reserved 4bytes for each 32 fields after byte 32
Sometimes the t_hoff value in the tuple header is 32 which seems to indicate
no NULL bitmap. This really makes me wonder what happens when you ALTER
TABLE ADD COLUMN on a table since it doesn't seem to take more than O(1)
time. Perhaps it is assumed if the attribute count is less than the actual
number of attributes then the last ones are NULL and no NULL map is
required.

> > The first tuple in my table has:
> ...
> > Bitmap: 3F00 0000 00F2
> > Attributes: 7
>
> you should have only 4 bytes of bitmap for 7 real attributes

Yes you are correct, my error. To find the bitmap length I was doing:
for (int i=0;i<header->t_hoff-30;i++)
Where if I were able to count it should have been:
for (int i=0;i<header->t_hoff-32;i++)

-Michael



Re: Tuple data

From
Tom Lane
Date:
pg_attribute tells you the types and ordering of the attributes
(columns) of a table.  Then see pg_type for the size and alignment
of each type.
        regards, tom lane


Re: Tuple data

From
Tom Lane
Date:
"Michael Richards" <miker@interchange.ca> writes:
> Sometimes the t_hoff value in the tuple header is 32 which seems to indicate
> no NULL bitmap.

There's no null bitmap unless the HASNULLS infomask bit is set.

> This really makes me wonder what happens when you ALTER
> TABLE ADD COLUMN on a table since it doesn't seem to take more than O(1)
> time. Perhaps it is assumed if the attribute count is less than the actual
> number of attributes then the last ones are NULL and no NULL map is
> required.

ALTER ADD COLUMN doesn't touch any tuples, and you're right that it's
critically dependent on heap_getattr returning NULL when an attribute
beyond the number of attributes actually present in a tuple is accessed.
That's a fragile and unclean implementation IMHO --- see past traffic
on this list.
        regards, tom lane


Re: Tuple data

From
Hannu Krosing
Date:
Tom Lane wrote:
> 
>> ALTER ADD COLUMN doesn't touch any tuples, and you're right that it's
> critically dependent on heap_getattr returning NULL when an attribute
> beyond the number of attributes actually present in a tuple is accessed.
> That's a fragile and unclean implementation IMHO --- see past traffic
> on this list.

Short of redesigning the whole storage format I can see no better way to
allow 
ALTER ADD COLUMN in any reasonable time. And I cna see no place where
this is 
more "fragile and unclean implementation" than any other in postgres -- 
OTOH it is quite hard for me to "see the past traffic on this list"  as
my 
"PgSQL HACKERS" mail folder is too big for anything else then grep ;)

The notion that anything not stored is NULL seems so natural to me that
it 
is very hard to find any substantial flaw or fragility with it.

--------------
Hannu


Re: Tuple data

From
Hiroshi Inoue
Date:
Hannu Krosing wrote:
> 
> Tom Lane wrote:
> >
> >> ALTER ADD COLUMN doesn't touch any tuples, and you're right that it's
> > critically dependent on heap_getattr returning NULL when an attribute
> > beyond the number of attributes actually present in a tuple is accessed.
> > That's a fragile and unclean implementation IMHO --- see past traffic
> > on this list.
> 
> Short of redesigning the whole storage format I can see no better way to
> allow
> ALTER ADD COLUMN in any reasonable time. And I cna see no place where
> this is
> more "fragile and unclean implementation" than any other in postgres --
> OTOH it is quite hard for me to "see the past traffic on this list"  as
> my
> "PgSQL HACKERS" mail folder is too big for anything else then grep ;)
>

I don't remember the traffic either.
IIRC,I objected to Tom at this point in pgsql-bugs recently.
I think it's very important for dbms that ALTER ADD COLUMN
touches tuples as less as possible.

Regards.
Hiroshi Inoue


Re: Tuple data

From
Tom Lane
Date:
Hiroshi Inoue <Inoue@tpf.co.jp> writes:
>> Tom Lane wrote:
>>>>> ALTER ADD COLUMN doesn't touch any tuples, and you're right that it's
>>>> critically dependent on heap_getattr returning NULL when an attribute
>>>> beyond the number of attributes actually present in a tuple is accessed.
>>>> That's a fragile and unclean implementation IMHO --- see past traffic
>>>> on this list.

> I don't remember the traffic either.
> IIRC,I objected to Tom at this point in pgsql-bugs recently.

That was the traffic I was recalling ;-)

> I think it's very important for dbms that ALTER ADD COLUMN
> touches tuples as less as possible.

I disagree.  The existing ADD COLUMN implementation only works for
appending columns at the end of tuples; it can't handle inserting
a column.  To make it usable for inherited tables requires truly
horrendous kluges (as you well know).  IMHO we'd be far better off
to rewrite ADD COLUMN so that it does go through and change all the
tuples, and then we could get rid of the hackery that tries --- not
very successfully --- to deal with inconsistent column orders between
parent and child tables.

I have a similar opinion about DROP COLUMN ...
        regards, tom lane


Re: Tuple data

From
"Michael Richards"
Date:
Considering how often you actually change the structure of a database, I
don't mind waiting for such a reorganisation to take place, however it would
still be nice if it could be done in O(1) time because it would minimise the
amount of downtime required for structure changes.

What are the cases where the current implementation does not handle it
properly?

Restructuring all the tables (inherited too) would require either 2x the
space or lots of hackery to take care of situations where there isn't enough
room for a larger null bitmap. This hackery seems more complicated than just
having alter look for inherited tables and add the column to those as well.

You could define a flag or something so a deleted column could be so flagged
and
ALTER TABLE DELETE COLUMN
would run just as fast. Vacuum could then take care of cleaning out these
columns. If you wanted to make it really exciting, how about searching for a
deleted column for the ADD column. Touch all the tuples by zeroing that
column and finally update pg_attribute. Nothing would be more fun than 2 way
fragmentation :)

-Michael

----- Original Message -----
From: "Tom Lane" <tgl@sss.pgh.pa.us>
To: "Hiroshi Inoue" <Inoue@tpf.co.jp>
Cc: "Hannu Krosing" <hannu@tm.ee>; "Michael Richards"
<miker@interchange.ca>; <pgsql-hackers@postgresql.org>
Sent: Sunday, December 17, 2000 8:05 PM
Subject: Re: [HACKERS] Tuple data


> Hiroshi Inoue <Inoue@tpf.co.jp> writes:
> >> Tom Lane wrote:
> >>>>> ALTER ADD COLUMN doesn't touch any tuples, and you're right that
it's
> >>>> critically dependent on heap_getattr returning NULL when an attribute
> >>>> beyond the number of attributes actually present in a tuple is
accessed.
> >>>> That's a fragile and unclean implementation IMHO --- see past traffic
> >>>> on this list.
>
> > I don't remember the traffic either.
> > IIRC,I objected to Tom at this point in pgsql-bugs recently.
>
> That was the traffic I was recalling ;-)
>
> > I think it's very important for dbms that ALTER ADD COLUMN
> > touches tuples as less as possible.
>
> I disagree.  The existing ADD COLUMN implementation only works for
> appending columns at the end of tuples; it can't handle inserting
> a column.  To make it usable for inherited tables requires truly
> horrendous kluges (as you well know).  IMHO we'd be far better off
> to rewrite ADD COLUMN so that it does go through and change all the
> tuples, and then we could get rid of the hackery that tries --- not
> very successfully --- to deal with inconsistent column orders between
> parent and child tables.
>
> I have a similar opinion about DROP COLUMN ...
>
> regards, tom lane



Re: Tuple data

From
Tom Lane
Date:
"Michael Richards" <miker@interchange.ca> writes:
> What are the cases where the current implementation does not handle it
> properly?

Inheritance.

CREATE TABLE parent (a, b, c);

CREATE TABLE child (z) INHERITS (parent);

ALTER TABLE parent ADD COLUMN (d);

With the current implementation you now have column order a,b,c,d in the
parent, and a,b,c,z,d in the child.  This is seriously broken for a
number of reasons, not least being that pg_dump can't realistically be
expected to reproduce that state.

I don't really buy the complaint about "it'll take 2x the space".  So
what?  You'll likely expend that anyway trying to load reasonable data
into the new column.  If we implemented ADD COLUMN in a less klugy
fashion, we could at least support loading a DEFAULT value into the
column (not to mention allowing it to be NOT NULL).  More to the point,
I don't think that using 2x space is a sufficient justification for the
complexity and fragility that are imposed *throughout* the system in
order to make ADD COLUMN's life easy.  You pay those hidden costs every
day you use Postgres, even if you've never done an ADD COLUMN in your
life.

> You could define a flag or something so a deleted column could be so flagged
> and ALTER TABLE DELETE COLUMN would run just as fast.

Hiroshi already tried that; you can find the vestiges of his attempt in
current sources (look for _DROP_COLUMN_HACK__).  Again, the cost to the
rest of the system strikes me as far more than I care to pay.

In the end it's a judgment call --- my judgment is that making these
features fast is not worth the implementation effort and
understandability/reliability penalties that ensue.  I think we would
be better off spending our effort on other things.
        regards, tom lane


Re: Tuple data

From
Hiroshi Inoue
Date:
Tom Lane wrote:
> 
> Hiroshi Inoue <Inoue@tpf.co.jp> writes:
> >> Tom Lane wrote:
> >>>>> ALTER ADD COLUMN doesn't touch any tuples, and you're right that it's
> >>>> critically dependent on heap_getattr returning NULL when an attribute
> >>>> beyond the number of attributes actually present in a tuple is accessed.
> >>>> That's a fragile and unclean implementation IMHO --- see past traffic
> >>>> on this list.
> 
> > I don't remember the traffic either.
> > IIRC,I objected to Tom at this point in pgsql-bugs recently.
> 
> That was the traffic I was recalling ;-)
> 
> > I think it's very important for dbms that ALTER ADD COLUMN
> > touches tuples as less as possible.
> 
> I disagree.  The existing ADD COLUMN implementation only works for
> appending columns at the end of tuples; it can't handle inserting
> a column.

Column order isn't essential in rdbms.
Isn't it well known that it's not preferable to use
'select *','insert' without column list etc.. in production
applications ?

> To make it usable for inherited tables requires truly
> horrendous kluges (as you well know). 

Logical/physical attribute numbers solves it naturally.

> IMHO we'd be far better off
> to rewrite ADD COLUMN so that it does go through and change all the
> tuples, and then we could get rid of the hackery that tries --- not
> very successfully --- to deal with inconsistent column orders between
> parent and child tables.
> 

We couldn't live without ALTER ADD COLUMN and it's
very critical for me to be able to ADD COLUMN even
when the target table is at full work.
It has been one of my criteria how cool the dbms is.
Fortunately PostgreSQL has been cool but ....

Regards.
Hiroshi Inoue


Re: Tuple data

From
"Michael Richards"
Date:
This is what I assumed the problem to be but I wasn't sure if there would be
more to it or not.

My question now is: Should the order in which columns are physically stored
matter?

Since the details of where to find the columns in the tuple data are stored
in pg_attribute, I'd think this is a place where the storage layer should be
free to store it as it likes. Consider as a performance enhancement
shuffling all the variable length columns to the end of the table. This
would save having to look at the size of all the variable length columns in
order to  examine a fixed length column.

Obviously since I only have a brief understanding of how stuff works I'm
relying on you to point out whether this is even a valid suggestion.

-Michael

> Inheritance.
>
> CREATE TABLE parent (a, b, c);
>
> CREATE TABLE child (z) INHERITS (parent);
>
> ALTER TABLE parent ADD COLUMN (d);
>
> With the current implementation you now have column order a,b,c,d in the
> parent, and a,b,c,z,d in the child.  This is seriously broken for a
> number of reasons, not least being that pg_dump can't realistically be
> expected to reproduce that state.




Re: Tuple data

From
"Robert B. Easter"
Date:
On Sunday 17 December 2000 19:10, Hiroshi Inoue wrote:
> Hannu Krosing wrote:
> > Tom Lane wrote:
> > >> ALTER ADD COLUMN doesn't touch any tuples, and you're right that it's
> > >
> > > critically dependent on heap_getattr returning NULL when an attribute
> > > beyond the number of attributes actually present in a tuple is
> > > accessed. That's a fragile and unclean implementation IMHO --- see past
> > > traffic on this list.
> >
> > Short of redesigning the whole storage format I can see no better way to
> > allow
> > ALTER ADD COLUMN in any reasonable time. And I cna see no place where
> > this is
> > more "fragile and unclean implementation" than any other in postgres --
> > OTOH it is quite hard for me to "see the past traffic on this list"  as
> > my
> > "PgSQL HACKERS" mail folder is too big for anything else then grep ;)
>
> I don't remember the traffic either.

This is kind of a lame comment, but the pgsql- mail lists are archived at 
www.mail-archive.com.  You can search lots of archived mail lists there.


> IIRC,I objected to Tom at this point in pgsql-bugs recently.
> I think it's very important for dbms that ALTER ADD COLUMN
> touches tuples as less as possible.
>
> Regards.
> Hiroshi Inoue

-- 
-------- Robert B. Easter  reaster@comptechnews.com ---------
- CompTechNews Message Board   http://www.comptechnews.com/ -
- CompTechServ Tech Services   http://www.comptechserv.com/ -
---------- http://www.comptechnews.com/~reaster/ ------------


Re: Tuple data

From
Tom Lane
Date:
Hiroshi Inoue <Inoue@tpf.co.jp> writes:
>> To make it usable for inherited tables requires truly
>> horrendous kluges (as you well know). 

> Logical/physical attribute numbers solves it naturally.

Maybe.  At this point that's a theory without experimental evidence
to back it up ;-).  I'm still concerned about how widespread/intrusive
the changes will need to be.
        regards, tom lane


Re: Tuple data

From
"Emmanuel Charpentier,,,"
Date:
Hiroshi Inoue wrote :

[ ... ]

> Column order isn't essential in rdbms.

<Nitpicking>

A relation (a table) is a subset of the Cartesain cross-product of the 
definition domains of the attributes (columns). Cartesian product being 
a commutative operation, "order of columns" does not really exists. Period.

If you impose an order relationship, you *add* inforation to the 
structure. That may be OK, but you can't rely on relational algebra to 
guarantee your results. You'll have to manage it yourself. (And, yes, 
there is relevant algebra for this, too ...).

</Nitpicking>

> Isn't it well known that it's not preferable to use
> 'select *','insert' without column list etc.. in production
> applications ?

100% agreed. Such a notation is an abbreviation. Handy, but dangerous. 
IMHO, such checking can (should ?) be done by an algorithm checking for 
column *names* before sending the "insert" command.

A partial workaround : inserting in a view containing only the relevant 
columns, in a suitable (and known) order.

[ Back to lurking ... ]