Thread: Something's been bugging me
A while back in an off-hand comment Tom packed varlenas he mentioned that we might want to have more types of toast pointers. Since then the idea of some alternative column-wise partitioning scheme has come up and another idea I've been tossing around is some kind of compression scheme which takes advantage of information across rows. The current varvarlena scheme doesn't leave any room for any expansion at all. Every possible value of varlena headers is meaningful and could potentially exist in a database. So any such scheme would be end any hope of binary compatibility in the future and probably any hope of a migrator since it would potentially expand existing data. Now not every possible idea for these options would need space in the varlena header but many would. I'm wondering whether it doesn't make sense to lower VARATT_SHORT_MAX to 0x70 to allow for at least a small number of constant values which could indicate some special type of datum. That could be used to indicate that a fixed size pointer like a toast pointer follows. That could be used for something like common value compression. [*] I'm almost tempted to suggest lowering it as far as 0x3f which would give us a whole bit. That would be necessary if we wanted to allow for variable length fields with some alternate interpretation following the header such as some very light compression scheme. But I'm pretty loath to give up as much as that now for only a potential future gain. [*] Yes, this does make the idea of keeping the VARHDRSZ/VARSHDRSZ offset in the varlen header seem pretty silly; hindsight is 20/20 and all that. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com
Gregory Stark <stark@enterprisedb.com> writes: > I'm wondering whether it doesn't make sense to lower VARATT_SHORT_MAX to 0x70 > to allow for at least a small number of constant values which could indicate > some special type of datum. That could be used to indicate that a fixed size > pointer like a toast pointer follows. That could be used for something like > common value compression. [*] I'm not for this because it would complicate the already-too-complicated inner-loop tests for deciding which form of datum you're looking at. The idea that I recall mentioning was to expend another byte in TOAST pointers to make them self-identifying, ie, instead of 0x80 or 0x01 signaling something that *must* be a 17-byte toast pointer, that bit pattern signals "something else" and the content of the next byte lets you know what. So TOAST pointers would take 18 bytes instead of 17, and there would be room for additions of other sorts of pointers. regards, tom lane
"Tom Lane" <tgl@sss.pgh.pa.us> writes: > Gregory Stark <stark@enterprisedb.com> writes: >> I'm wondering whether it doesn't make sense to lower VARATT_SHORT_MAX to 0x70 >> to allow for at least a small number of constant values which could indicate >> some special type of datum. That could be used to indicate that a fixed size >> pointer like a toast pointer follows. That could be used for something like >> common value compression. [*] > > I'm not for this because it would complicate the already-too-complicated > inner-loop tests for deciding which form of datum you're looking at. > > The idea that I recall mentioning was to expend another byte in TOAST > pointers to make them self-identifying, ie, instead of 0x80 or 0x01 > signaling something that *must* be a 17-byte toast pointer, that bit > pattern signals "something else" and the content of the next byte > lets you know what. So TOAST pointers would take 18 bytes instead of > 17, and there would be room for additions of other sorts of pointers. Hm, wouldn't that be just as expensive though? You would still have to look at the next byte and check it against various values to see what length to skip over. Hm, unless we put the length in the following byte. Also the difference between (first-byte ^ 0x80 < 0x70) and (first-byte & 0x80 == 0x80) seems like it's going to be pretty slight. I suppose we don't have to decide now. We could just put a 1-byte padding byte containing 0 (or 17 or 18, though I think 0 is safest) at the front of the toast pointer structure for now -- we don't have to actually check what it contains yet. For that matter we could lower VARATT_SHORT_MAX so we don't generate any short varlenas over 0x70 in length but not actually check for them in VARATT_IS_1B() yet either. The choice of strategy might depend on what we're trying to encode in there. I was picturing using a single byte for up to 256 common values and for that it might be unfortunate if we need two more bytes of overhead. On the other hand something else I was pondering was doing some form of lz compression using some global dictionary in which case one more byte is not going to matter at all. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com
Gregory Stark <stark@enterprisedb.com> writes: > "Tom Lane" <tgl@sss.pgh.pa.us> writes: >> I'm not for this because it would complicate the already-too-complicated >> inner-loop tests for deciding which form of datum you're looking at. >> >> The idea that I recall mentioning was to expend another byte in TOAST >> pointers to make them self-identifying, ie, instead of 0x80 or 0x01 >> signaling something that *must* be a 17-byte toast pointer, that bit >> pattern signals "something else" and the content of the next byte >> lets you know what. So TOAST pointers would take 18 bytes instead of >> 17, and there would be room for additions of other sorts of pointers. > Hm, wouldn't that be just as expensive though? No, I don't think so, because it'd be a second-level test that's only hit after determining that you have a 1B_E kind of datum. Furthermore, it'd only be hit if you were actually trying to determing the datum's value, and not if (for instance) you only wanted to skip past it to the next field. > You would still have to look at > the next byte and check it against various values to see what length to skip > over. Hm, unless we put the length in the following byte. Making the second byte contain the length would work well for the sorts of cases I was envisioning, which was different sorts of fixed-size pointer-ish objects. I suppose it would not scale well towards having a different kind of inline compression method, but I don't see how your proposal handles that either. After a bit of reflection, I'd argue that variant inline compression methods ought to be implemented within the 4B_C family of datum representations. It looks to me like we have a couple of leftover bits within that, because va_rawsize can't exceed 1G, and so there is room to include two flag bits in it. regards, tom lane
"Tom Lane" <tgl@sss.pgh.pa.us> writes: > Gregory Stark <stark@enterprisedb.com> writes: >> I'm wondering whether it doesn't make sense to lower VARATT_SHORT_MAX to 0x70 >> to allow for at least a small number of constant values which could indicate >> some special type of datum. That could be used to indicate that a fixed size >> pointer like a toast pointer follows. That could be used for something like >> common value compression. [*] > > I'm not for this because it would complicate the already-too-complicated > inner-loop tests for deciding which form of datum you're looking at. > > The idea that I recall mentioning was to expend another byte in TOAST > pointers to make them self-identifying, ie, instead of 0x80 or 0x01 > signaling something that *must* be a 17-byte toast pointer, that bit > pattern signals "something else" and the content of the next byte > lets you know what. So TOAST pointers would take 18 bytes instead of > 17, and there would be room for additions of other sorts of pointers. Here's a patch that does all of the above. I accidentally included the debugging ifdef to test with textin producing short varlenas since that was in my tree but thought I would leave it in since it's just as easy for you to remove it as me and it might be useful to leave in since it's in an ifdef anyways. Two style issues: 1) the magic constant 0 could either be hard coded into postgres.h for now or should become a VARATT_EXTERNAL_TYPE_POINTER or something like that. 2) the test in tuptoaster.c could be if (toast_isnull[i] || !VARATT_IS_EXTERNAL(new_value) || VARSIZE_EXTERNAL(old_value) != VARSIZE_EXTERNAL(new_value) || memcmp(VARDATA_SHORT(old_value), VARDATA_SHORT(new_value), VARSIZE_EXTERNAL(old_value)) != 0) which optimizes away to the same thing but I thought it was clearer to leave those niceties out for now. tuptoaster.c would probably always have to know what type of pointers it's looking at at that point anyways. I am sorry for leaving this until so late. It's been bugging me for a while but I thought it was better not to make trouble. I guess seeing the release looming near made me realize what the consequences might be of neglecting it. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com
Attachment
Gregory Stark <stark@enterprisedb.com> writes: > "Tom Lane" <tgl@sss.pgh.pa.us> writes: >> The idea that I recall mentioning was to expend another byte in TOAST >> pointers to make them self-identifying, ie, instead of 0x80 or 0x01 >> signaling something that *must* be a 17-byte toast pointer, that bit >> pattern signals "something else" and the content of the next byte >> lets you know what. So TOAST pointers would take 18 bytes instead of >> 17, and there would be room for additions of other sorts of pointers. > Here's a patch that does all of the above. I'd be inclined to make the second byte be the length and have VARSIZE_1B_E depend on that --- any objection? > 2) the test in tuptoaster.c could be > if (toast_isnull[i] || > !VARATT_IS_EXTERNAL(new_value) || > VARSIZE_EXTERNAL(old_value) != VARSIZE_EXTERNAL(new_value) || > memcmp(VARDATA_SHORT(old_value), > VARDATA_SHORT(new_value), > VARSIZE_EXTERNAL(old_value)) != 0) Yeah, I'd go with this just to avoid having hardwired knowledge of the datum size here. regards, tom lane
"Tom Lane" <tgl@sss.pgh.pa.us> writes: > I'd be inclined to make the second byte be the length and have > VARSIZE_1B_E depend on that --- any objection? On one hand it offends me since it's hard coding an assumption that the size of a pointer decides what it contains and vice versa. There's nothing saying we won't have two possible special meanings for a one-byte datum. And it forecloses any possibility of having a type whose size is at all variable. I like your idea of using the 4-byte header for variable sized structures, but what about structures whose size depends on an architecture detail. We might one day have a pointer which contains a wchar_t or a bigint and then not have any way to tell whether we have a conflict with some other pointer structure on some architectures. On the other hand I suppose you're concerned about the time to do a few comparisons before knowing which length to skip over? I'm not entirely sure cycle-counting at that level leads to the correct conclusions. In particular I think a few checks against constant values followed by a single assignment can actually be cheaper with speculative execution than having to copy data from memory and then do subsequent calculations depending on it. I'm not sure we'll every know though since I doubt it will be measurable. (I'm also not entirely clear which length to put in, the entire length including the header or the length just of the pointer. Personally I think I would prefer just the pointer. I suppose that makes the macro VARSIZE_EXTERNAL_EXHDR_EXHDR() :/ ) -- Gregory Stark EnterpriseDB http://www.enterprisedb.com
Gregory Stark <stark@enterprisedb.com> writes: > "Tom Lane" <tgl@sss.pgh.pa.us> writes: >> I'd be inclined to make the second byte be the length and have >> VARSIZE_1B_E depend on that --- any objection? > On one hand it offends me since it's hard coding an assumption that the size > of a pointer decides what it contains and vice versa. There's nothing saying > we won't have two possible special meanings for a one-byte datum. Well, what you're proposing is to treat the second byte of a 1B_E datum as an enum value, requiring every piece of code that examines it to know exactly what all the possible values are. That doesn't sound like a good idea to me. If it does turn out to be a good idea, we could still redefine it that way --- we'd just have assigned 18 not 0 as the enumeration value for basic TOAST pointers. The key point in my mind is that there is lots of performance-critical code (the inner loops of heap_deform_tuple and friends) that needs to determine the physical size of a datum quickly. Interpreting the content of a toasted datum is a completely separate and much less performance-critical task. If it turns out that the size is not sufficient to tell the difference between two types of 1B_E values, we can go over to examining the contents instead. I note that basic TOAST pointers start with va_rawsize which can't exceed 1G, so there are two free bits that could be exploited in exactly the same way as TOAST and now varvarlena have done with 4-byte datum length words. > And it forecloses any possibility of having a type whose size is at all > variable. Au contraire, I think it makes it easier, at least for sizes up to 255 bytes --- you need not introduce any more complexity into VARSIZE_ANY to have that. > On the other hand I suppose you're concerned about the time to do a few > comparisons before knowing which length to skip over? I'm not entirely sure > cycle-counting at that level leads to the correct conclusions. I am. I have spent many many hours examining PG profiling results, and stuff in and around the tuple-decoding loops is almost always interesting from a performance standpoint. Cycles spent in interpreting a toast pointer never are (not least because you probably have to go off and do I/O after you interpret the pointer). I'm willing to push almost any amount of work onto toast_fetch_datum if it'll save cycles in VARSIZE_ANY. But in this case you haven't even demonstrated a reason to think that any complexity will be added there. The likely uses for this, in my mind, are toast pointers with wider valueid fields and toast pointers with indicators of different compression methods, and those seem like they'd naturally be different sizes anyway. regards, tom lane
Gregory Stark <stark@enterprisedb.com> writes: > Here's a patch that does all of the above. Applied with tweak to use the added byte as an actual length word. I ran into an interesting failure here on HPPA: the code the compiler generated for copying unaligned toast pointers into aligned local variables failed, because it was assuming halfword (2-byte) alignment of the data to be copied! (Instead of a memcpy call it was generating an inline loop of ldh/sth instructions.) Apparently gcc's thought process is "the pointer is declared as struct varlena *, therefore must be at least 4-aligned, therefore the data at offset 2 is at least 2-aligned". The intermediate cast to "varattrib_1b_e *" did not prevent this; I had to assign the datum pointer into a separate local variable of that type to suppress the "optimization". I'm not sure if the gcc boys would consider this a bug or not; I kinda suspect the behavior is intentional, because otherwise they'd not be able to optimize constructs likememcpy((char *) &foo, ...) which is a pretty darn widespread locution. Anyway, it seems to work now, I just thought I'd put something in the archives about what that VARATT_EXTERNAL_GET_POINTER macro is for. regards, tom lane
"Tom Lane" <tgl@sss.pgh.pa.us> writes: > Apparently gcc's thought process is "the pointer is declared as struct > varlena *, therefore must be at least 4-aligned, therefore the data at > offset 2 is at least 2-aligned". The intermediate cast to "varattrib_1b_e *" > did not prevent this; I had to assign the datum pointer into a separate > local variable of that type to suppress the "optimization". Fascinating. Why do you cast arguments to memcmp to char* ? -- Gregory Stark EnterpriseDB http://www.enterprisedb.com
Gregory Stark <stark@enterprisedb.com> writes: > Why do you cast arguments to memcmp to char* ? Well, *I* haven't done it in a long time, but it used to be a fairly standard thing. I imagine that back before memcpy was usually declared with void * arguments, it was necessary to avoid compiler warnings. The problem would still exist if the code were writtenmemcpy((void *) &foo, ...) so I suspect that an in-line cast to void * wouldn't work any better. regards, tom lane
"Tom Lane" <tgl@sss.pgh.pa.us> writes: > Gregory Stark <stark@enterprisedb.com> writes: >> Why do you cast arguments to memcmp to char* ? > > Well, *I* haven't done it in a long time, I'm referring to tuptoaster.c:488 -- Gregory Stark EnterpriseDB http://www.enterprisedb.com
Gregory Stark <stark@enterprisedb.com> writes: > "Tom Lane" <tgl@sss.pgh.pa.us> writes: >> Gregory Stark <stark@enterprisedb.com> writes: >>> Why do you cast arguments to memcmp to char* ? >> >> Well, *I* haven't done it in a long time, > I'm referring to tuptoaster.c:488 Oh, I'm sorry, I thought you were talking about my hypothetical memcpy example. The casts in the memcmp call were because I was worried about some compiler trying to optimize the memcmp into a word-wide operation on the strength of the theory that struct varlena pointers must be word-aligned. Given the later discovery about the copying problem, those casts may be unhelpful as far as gcc is concerned; but I'm still inclined to leave them there in case they make a difference for some other compiler. regards, tom lane
* Tom Lane: > I ran into an interesting failure here on HPPA: the code the compiler > generated for copying unaligned toast pointers into aligned local > variables failed, because it was assuming halfword (2-byte) alignment of > the data to be copied! (Instead of a memcpy call it was generating an > inline loop of ldh/sth instructions.) Apparently gcc's thought process > is "the pointer is declared as struct varlena *, therefore must be at > least 4-aligned, therefore the data at offset 2 is at least 2-aligned". > The intermediate cast to "varattrib_1b_e *" did not prevent this; I > had to assign the datum pointer into a separate local variable of that > type to suppress the "optimization". This is quite deliberate, it leads to better code. In general, once you've cast a pointer to something which needs more alignment than what's actually, there is no way to get away from that (except using asm insertions as optimization barriers, of course). This is fine from the C semantics because undefined behavior occurs during the cast already. I believe that other compilers have similar rules (certainly when it comes to aliasing), so it's better not to try to outsmart the C standard here. -- Florian Weimer <fweimer@bfk.de> BFK edv-consulting GmbH http://www.bfk.de/ Kriegsstraße 100 tel: +49-721-96201-1 D-76133 Karlsruhe fax: +49-721-96201-99