Thread: Something's been bugging me

Something's been bugging me

From
Gregory Stark
Date:
A while back in an off-hand comment Tom packed varlenas he mentioned that we
might want to have more types of toast pointers. Since then the idea of some
alternative column-wise partitioning scheme has come up and another idea I've
been tossing around is some kind of compression scheme which takes advantage
of information across rows.

The current varvarlena scheme doesn't leave any room for any expansion at all.
Every possible value of varlena headers is meaningful and could potentially
exist in a database. So any such scheme would be end any hope of binary
compatibility in the future and probably any hope of a migrator since it would
potentially expand existing data. 

Now not every possible idea for these options would need space in the varlena
header but many would.

I'm wondering whether it doesn't make sense to lower VARATT_SHORT_MAX to 0x70
to allow for at least a small number of constant values which could indicate
some special type of datum. That could be used to indicate that a fixed size
pointer like a toast pointer follows. That could be used for something like
common value compression. [*]

I'm almost tempted to suggest lowering it as far as 0x3f which would give us a
whole bit. That would be necessary if we wanted to allow for variable length
fields with some alternate interpretation following the header such as some
very light compression scheme. But I'm pretty loath to give up as much as that
now for only a potential future gain.

[*] Yes, this does make the idea of keeping the VARHDRSZ/VARSHDRSZ offset in
the varlen header seem pretty silly; hindsight is 20/20 and all that.

--  Gregory Stark EnterpriseDB          http://www.enterprisedb.com


Re: Something's been bugging me

From
Tom Lane
Date:
Gregory Stark <stark@enterprisedb.com> writes:
> I'm wondering whether it doesn't make sense to lower VARATT_SHORT_MAX to 0x70
> to allow for at least a small number of constant values which could indicate
> some special type of datum. That could be used to indicate that a fixed size
> pointer like a toast pointer follows. That could be used for something like
> common value compression. [*]

I'm not for this because it would complicate the already-too-complicated
inner-loop tests for deciding which form of datum you're looking at.

The idea that I recall mentioning was to expend another byte in TOAST
pointers to make them self-identifying, ie, instead of 0x80 or 0x01
signaling something that *must* be a 17-byte toast pointer, that bit
pattern signals "something else" and the content of the next byte
lets you know what.  So TOAST pointers would take 18 bytes instead of
17, and there would be room for additions of other sorts of pointers.
        regards, tom lane


Re: Something's been bugging me

From
Gregory Stark
Date:
"Tom Lane" <tgl@sss.pgh.pa.us> writes:

> Gregory Stark <stark@enterprisedb.com> writes:
>> I'm wondering whether it doesn't make sense to lower VARATT_SHORT_MAX to 0x70
>> to allow for at least a small number of constant values which could indicate
>> some special type of datum. That could be used to indicate that a fixed size
>> pointer like a toast pointer follows. That could be used for something like
>> common value compression. [*]
>
> I'm not for this because it would complicate the already-too-complicated
> inner-loop tests for deciding which form of datum you're looking at.
>
> The idea that I recall mentioning was to expend another byte in TOAST
> pointers to make them self-identifying, ie, instead of 0x80 or 0x01
> signaling something that *must* be a 17-byte toast pointer, that bit
> pattern signals "something else" and the content of the next byte
> lets you know what.  So TOAST pointers would take 18 bytes instead of
> 17, and there would be room for additions of other sorts of pointers.

Hm, wouldn't that be just as expensive though? You would still have to look at
the next byte and check it against various values to see what length to skip
over. Hm, unless we put the length in the following byte. Also the difference
between (first-byte ^ 0x80 < 0x70) and (first-byte & 0x80 == 0x80) seems like
it's going to be pretty slight.

I suppose we don't have to decide now. We could just put a 1-byte padding byte
containing 0 (or 17 or 18, though I think 0 is safest) at the front of the
toast pointer structure for now -- we don't have to actually check what it
contains yet.

For that matter we could lower VARATT_SHORT_MAX so we don't generate any short
varlenas over 0x70 in length but not actually check for them in VARATT_IS_1B()
yet either.

The choice of strategy might depend on what we're trying to encode in there. I
was picturing using a single byte for up to 256 common values and for that it
might be unfortunate if we need two more bytes of overhead. On the other hand
something else I was pondering was doing some form of lz compression using
some global dictionary in which case one more byte is not going to matter at
all.

--  Gregory Stark EnterpriseDB          http://www.enterprisedb.com


Re: Something's been bugging me

From
Tom Lane
Date:
Gregory Stark <stark@enterprisedb.com> writes:
> "Tom Lane" <tgl@sss.pgh.pa.us> writes:
>> I'm not for this because it would complicate the already-too-complicated
>> inner-loop tests for deciding which form of datum you're looking at.
>> 
>> The idea that I recall mentioning was to expend another byte in TOAST
>> pointers to make them self-identifying, ie, instead of 0x80 or 0x01
>> signaling something that *must* be a 17-byte toast pointer, that bit
>> pattern signals "something else" and the content of the next byte
>> lets you know what.  So TOAST pointers would take 18 bytes instead of
>> 17, and there would be room for additions of other sorts of pointers.

> Hm, wouldn't that be just as expensive though?

No, I don't think so, because it'd be a second-level test that's only
hit after determining that you have a 1B_E kind of datum.  Furthermore,
it'd only be hit if you were actually trying to determing the datum's
value, and not if (for instance) you only wanted to skip past it to the
next field.

> You would still have to look at
> the next byte and check it against various values to see what length to skip
> over. Hm, unless we put the length in the following byte.

Making the second byte contain the length would work well for the sorts
of cases I was envisioning, which was different sorts of fixed-size
pointer-ish objects.  I suppose it would not scale well towards having a
different kind of inline compression method, but I don't see how your
proposal handles that either.

After a bit of reflection, I'd argue that variant inline compression
methods ought to be implemented within the 4B_C family of datum
representations.  It looks to me like we have a couple of leftover bits
within that, because va_rawsize can't exceed 1G, and so there is room
to include two flag bits in it.
        regards, tom lane


Re: Something's been bugging me

From
Gregory Stark
Date:
"Tom Lane" <tgl@sss.pgh.pa.us> writes:

> Gregory Stark <stark@enterprisedb.com> writes:
>> I'm wondering whether it doesn't make sense to lower VARATT_SHORT_MAX to 0x70
>> to allow for at least a small number of constant values which could indicate
>> some special type of datum. That could be used to indicate that a fixed size
>> pointer like a toast pointer follows. That could be used for something like
>> common value compression. [*]
>
> I'm not for this because it would complicate the already-too-complicated
> inner-loop tests for deciding which form of datum you're looking at.
>
> The idea that I recall mentioning was to expend another byte in TOAST
> pointers to make them self-identifying, ie, instead of 0x80 or 0x01
> signaling something that *must* be a 17-byte toast pointer, that bit
> pattern signals "something else" and the content of the next byte
> lets you know what.  So TOAST pointers would take 18 bytes instead of
> 17, and there would be room for additions of other sorts of pointers.

Here's a patch that does all of the above.

I accidentally included the debugging ifdef to test with textin producing
short varlenas since that was in my tree but thought I would leave it in since
it's just as easy for you to remove it as me and it might be useful to leave
in since it's in an ifdef anyways.

Two style issues: 1) the magic constant 0 could either be hard coded into
postgres.h for now or should become a VARATT_EXTERNAL_TYPE_POINTER or
something like that. 2) the test in tuptoaster.c could be

        if (toast_isnull[i] ||
            !VARATT_IS_EXTERNAL(new_value) ||
            VARSIZE_EXTERNAL(old_value) != VARSIZE_EXTERNAL(new_value) ||
            memcmp(VARDATA_SHORT(old_value),
                   VARDATA_SHORT(new_value),
                   VARSIZE_EXTERNAL(old_value)) != 0)

which optimizes away to the same thing but I thought it was clearer to leave
those niceties out for now. tuptoaster.c would probably always have to know
what type of pointers it's looking at at that point anyways.


I am sorry for leaving this until so late. It's been bugging me for a while
but I thought it was better not to make trouble. I guess seeing the release
looming near made me realize what the consequences might be of neglecting it.

--
  Gregory Stark
  EnterpriseDB          http://www.enterprisedb.com

Attachment

Re: Something's been bugging me

From
Tom Lane
Date:
Gregory Stark <stark@enterprisedb.com> writes:
> "Tom Lane" <tgl@sss.pgh.pa.us> writes:
>> The idea that I recall mentioning was to expend another byte in TOAST
>> pointers to make them self-identifying, ie, instead of 0x80 or 0x01
>> signaling something that *must* be a 17-byte toast pointer, that bit
>> pattern signals "something else" and the content of the next byte
>> lets you know what.  So TOAST pointers would take 18 bytes instead of
>> 17, and there would be room for additions of other sorts of pointers.

> Here's a patch that does all of the above. 

I'd be inclined to make the second byte be the length and have
VARSIZE_1B_E depend on that --- any objection?

> 2) the test in tuptoaster.c could be

>         if (toast_isnull[i] || 
>             !VARATT_IS_EXTERNAL(new_value) ||
>             VARSIZE_EXTERNAL(old_value) != VARSIZE_EXTERNAL(new_value) ||
>             memcmp(VARDATA_SHORT(old_value),
>                    VARDATA_SHORT(new_value),
>                    VARSIZE_EXTERNAL(old_value)) != 0)

Yeah, I'd go with this just to avoid having hardwired knowledge of the
datum size here.
        regards, tom lane


Re: Something's been bugging me

From
Gregory Stark
Date:
"Tom Lane" <tgl@sss.pgh.pa.us> writes:

> I'd be inclined to make the second byte be the length and have
> VARSIZE_1B_E depend on that --- any objection?

On one hand it offends me since it's hard coding an assumption that the size
of a pointer decides what it contains and vice versa. There's nothing saying
we won't have two possible special meanings for a one-byte datum.

And it forecloses any possibility of having a type whose size is at all
variable. I like your idea of using the 4-byte header for variable sized
structures, but what about structures whose size depends on an architecture
detail. We might one day have a pointer which contains a wchar_t or a bigint
and then not have any way to tell whether we have a conflict with some other
pointer structure on some architectures.

On the other hand I suppose you're concerned about the time to do a few
comparisons before knowing which length to skip over? I'm not entirely sure
cycle-counting at that level leads to the correct conclusions. In particular I
think a few checks against constant values followed by a single assignment can
actually be cheaper with speculative execution than having to copy data from
memory and then do subsequent calculations depending on it. I'm not sure we'll
every know though since I doubt it will be measurable.

(I'm also not entirely clear which length to put in, the entire length
including the header or the length just of the pointer. Personally I think I
would prefer just the pointer. I suppose that makes the macro
VARSIZE_EXTERNAL_EXHDR_EXHDR() :/ )


--  Gregory Stark EnterpriseDB          http://www.enterprisedb.com


Re: Something's been bugging me

From
Tom Lane
Date:
Gregory Stark <stark@enterprisedb.com> writes:
> "Tom Lane" <tgl@sss.pgh.pa.us> writes:
>> I'd be inclined to make the second byte be the length and have
>> VARSIZE_1B_E depend on that --- any objection?

> On one hand it offends me since it's hard coding an assumption that the size
> of a pointer decides what it contains and vice versa. There's nothing saying
> we won't have two possible special meanings for a one-byte datum.

Well, what you're proposing is to treat the second byte of a 1B_E datum
as an enum value, requiring every piece of code that examines it to know
exactly what all the possible values are.  That doesn't sound like a
good idea to me.  If it does turn out to be a good idea, we could still
redefine it that way --- we'd just have assigned 18 not 0 as the
enumeration value for basic TOAST pointers.

The key point in my mind is that there is lots of performance-critical
code (the inner loops of heap_deform_tuple and friends) that needs to
determine the physical size of a datum quickly.  Interpreting the
content of a toasted datum is a completely separate and much less
performance-critical task.  If it turns out that the size is not
sufficient to tell the difference between two types of 1B_E values,
we can go over to examining the contents instead.  I note that basic
TOAST pointers start with va_rawsize which can't exceed 1G, so there
are two free bits that could be exploited in exactly the same way as
TOAST and now varvarlena have done with 4-byte datum length words.

> And it forecloses any possibility of having a type whose size is at all
> variable.

Au contraire, I think it makes it easier, at least for sizes up to 255
bytes --- you need not introduce any more complexity into VARSIZE_ANY
to have that.

> On the other hand I suppose you're concerned about the time to do a few
> comparisons before knowing which length to skip over? I'm not entirely sure
> cycle-counting at that level leads to the correct conclusions.

I am.  I have spent many many hours examining PG profiling results,
and stuff in and around the tuple-decoding loops is almost always
interesting from a performance standpoint.  Cycles spent in interpreting
a toast pointer never are (not least because you probably have to go off
and do I/O after you interpret the pointer).  I'm willing to push almost
any amount of work onto toast_fetch_datum if it'll save cycles in
VARSIZE_ANY.  But in this case you haven't even demonstrated a reason
to think that any complexity will be added there.  The likely uses for
this, in my mind, are toast pointers with wider valueid fields and toast
pointers with indicators of different compression methods, and those
seem like they'd naturally be different sizes anyway.
        regards, tom lane


Re: Something's been bugging me

From
Tom Lane
Date:
Gregory Stark <stark@enterprisedb.com> writes:
> Here's a patch that does all of the above. 

Applied with tweak to use the added byte as an actual length word.

I ran into an interesting failure here on HPPA: the code the compiler
generated for copying unaligned toast pointers into aligned local
variables failed, because it was assuming halfword (2-byte) alignment of
the data to be copied!  (Instead of a memcpy call it was generating an
inline loop of ldh/sth instructions.)  Apparently gcc's thought process
is "the pointer is declared as struct varlena *, therefore must be at
least 4-aligned, therefore the data at offset 2 is at least 2-aligned".
The intermediate cast to "varattrib_1b_e *" did not prevent this; I
had to assign the datum pointer into a separate local variable of that
type to suppress the "optimization".

I'm not sure if the gcc boys would consider this a bug or not; I kinda
suspect the behavior is intentional, because otherwise they'd not be
able to optimize constructs likememcpy((char *) &foo, ...)
which is a pretty darn widespread locution.

Anyway, it seems to work now, I just thought I'd put something in the
archives about what that VARATT_EXTERNAL_GET_POINTER macro is for.
        regards, tom lane


Re: Something's been bugging me

From
Gregory Stark
Date:
"Tom Lane" <tgl@sss.pgh.pa.us> writes:

> Apparently gcc's thought process is "the pointer is declared as struct
> varlena *, therefore must be at least 4-aligned, therefore the data at
> offset 2 is at least 2-aligned". The intermediate cast to "varattrib_1b_e *"
> did not prevent this; I had to assign the datum pointer into a separate
> local variable of that type to suppress the "optimization".

Fascinating.

Why do you cast arguments to memcmp to char* ?

--  Gregory Stark EnterpriseDB          http://www.enterprisedb.com


Re: Something's been bugging me

From
Tom Lane
Date:
Gregory Stark <stark@enterprisedb.com> writes:
> Why do you cast arguments to memcmp to char* ?

Well, *I* haven't done it in a long time, but it used to be a fairly
standard thing.  I imagine that back before memcpy was usually declared
with void * arguments, it was necessary to avoid compiler warnings.

The problem would still exist if the code were writtenmemcpy((void *) &foo, ...)
so I suspect that an in-line cast to void * wouldn't work any better.
        regards, tom lane


Re: Something's been bugging me

From
Gregory Stark
Date:
"Tom Lane" <tgl@sss.pgh.pa.us> writes:

> Gregory Stark <stark@enterprisedb.com> writes:
>> Why do you cast arguments to memcmp to char* ?
>
> Well, *I* haven't done it in a long time, 

I'm referring to tuptoaster.c:488

--  Gregory Stark EnterpriseDB          http://www.enterprisedb.com


Re: Something's been bugging me

From
Tom Lane
Date:
Gregory Stark <stark@enterprisedb.com> writes:
> "Tom Lane" <tgl@sss.pgh.pa.us> writes:
>> Gregory Stark <stark@enterprisedb.com> writes:
>>> Why do you cast arguments to memcmp to char* ?
>> 
>> Well, *I* haven't done it in a long time, 

> I'm referring to tuptoaster.c:488

Oh, I'm sorry, I thought you were talking about my hypothetical memcpy
example.

The casts in the memcmp call were because I was worried about some
compiler trying to optimize the memcmp into a word-wide operation
on the strength of the theory that struct varlena pointers must be
word-aligned.  Given the later discovery about the copying problem,
those casts may be unhelpful as far as gcc is concerned; but I'm still
inclined to leave them there in case they make a difference for some
other compiler.
        regards, tom lane


Re: Something's been bugging me

From
Florian Weimer
Date:
* Tom Lane:

> I ran into an interesting failure here on HPPA: the code the compiler
> generated for copying unaligned toast pointers into aligned local
> variables failed, because it was assuming halfword (2-byte) alignment of
> the data to be copied!  (Instead of a memcpy call it was generating an
> inline loop of ldh/sth instructions.)  Apparently gcc's thought process
> is "the pointer is declared as struct varlena *, therefore must be at
> least 4-aligned, therefore the data at offset 2 is at least 2-aligned".
> The intermediate cast to "varattrib_1b_e *" did not prevent this; I
> had to assign the datum pointer into a separate local variable of that
> type to suppress the "optimization".

This is quite deliberate, it leads to better code.  In general, once
you've cast a pointer to something which needs more alignment than
what's actually, there is no way to get away from that (except using
asm insertions as optimization barriers, of course).  This is fine
from the C semantics because undefined behavior occurs during the cast
already.

I believe that other compilers have similar rules (certainly when it
comes to aliasing), so it's better not to try to outsmart the C
standard here.

--
Florian Weimer                <fweimer@bfk.de>
BFK edv-consulting GmbH       http://www.bfk.de/
Kriegsstraße 100              tel: +49-721-96201-1
D-76133 Karlsruhe             fax: +49-721-96201-99