Thread: Short varlena headers and arrays

Short varlena headers and arrays

From
Gregory Stark
Date:
I had intended to make varlenas alignment 'c' and have the heaptuple.c force
them to alignment 'i' if they required it. However I've noticed a problem that
makes me think I should do this the other way around.

The problem is that other places in the codebase use the alignment. In
particular arrays do. Also toasting.c expects to get a worst-case size from
att_align rather than a best-case. Also there's indextuple.c but probably I
should get to that in this round anyways.

So now I'm thinking it's best to leave them as alignment 'i' unless
heaptuple.c thinks it can get away without aligning them. This means we don't
have a convenient way for data types to opt out of this header compression.
But the more I think about it the less convinced I am that we need that. The
alignment inside the data type doesn't matter since you'll only be working
with detoasted versions of them unless you specifically go out of your way to
do otherwise.


Once this is done it may be worth having arrays convert to short varlenas as
well. Arrays of short strings hurt pretty badly currently:

postgres=# select pg_column_size(array['a','b','c','d']);pg_column_size 
----------------            56
(1 row)

The only problem with this is if it's more likely for someone to stuff things
in an array and then read them back out without detoasting than it is for
someone to stuff them in a tuple. Probably the risk is the same. There is some
code that assumes it understands how arrays are laid out in execQual.c and
varlena.c.

--  Gregory Stark EnterpriseDB          http://www.enterprisedb.com


Re: Short varlena headers and arrays

From
Tom Lane
Date:
Gregory Stark <stark@enterprisedb.com> writes:
> Once this is done it may be worth having arrays convert to short varlenas as
> well.

Elements of arrays are not subject to being toasted by themselves, so
I don't think you can make that work.  At least not without breaking
wide swaths of code that works fine today.
        regards, tom lane


Re: Short varlena headers and arrays

From
Gregory Stark
Date:
"Tom Lane" <tgl@sss.pgh.pa.us> writes:

> Gregory Stark <stark@enterprisedb.com> writes:
>> Once this is done it may be worth having arrays convert to short varlenas as
>> well.
>
> Elements of arrays are not subject to being toasted by themselves, so
> I don't think you can make that work.  At least not without breaking
> wide swaths of code that works fine today.

You think it's more likely there are places that build arrays and then read
the items back without passing through detoast than there are places that
build tuples and do so?



Btw I ran into some problems with system tables. Since many of them are read
using the GETSTRUCT method and in that method the first varlena field should
be safely accessible, i would have to not skip the alignment for the first
varlena field in system tables. Instead I just punt on all system tables. The
only one that seems like it'll be loss on is pg_statistic and there the
biggest problem is the space wasted inside the arrays, not before the varlena
fields.

Also, int2vector and oidvector don't expect to be toasted so I've skipped them
as well. If we want to have an escape hatch they would have to be so marked.
For now I just hard coded them.

--  Gregory Stark EnterpriseDB          http://www.enterprisedb.com


Re: Short varlena headers and arrays

From
Tom Lane
Date:
Gregory Stark <stark@enterprisedb.com> writes:
> "Tom Lane" <tgl@sss.pgh.pa.us> writes:
>> Elements of arrays are not subject to being toasted by themselves, so
>> I don't think you can make that work.  At least not without breaking
>> wide swaths of code that works fine today.

> You think it's more likely there are places that build arrays and then read
> the items back without passing through detoast than there are places that
> build tuples and do so?

The former is valid per the coding rules, the latter is not, so...
        regards, tom lane