Thread: mcvstats serialization code is still shy of a load
I'm seeing a reproducible bus error here: #0 0x00417420 in statext_mcv_serialize (mcvlist=0x62223450, stats=Variable "stats" is not available. ) at mcv.c:785 785 memcpy(ITEM_BASE_FREQUENCY(item, ndims), &mcvitem->base_frequency, sizeof(double)); What appears to be happening is that since ITEM_BASE_FREQUENCY is defined as #define ITEM_BASE_FREQUENCY(item,ndims) ((double *) (ITEM_FREQUENCY(item, ndims) + 1)) the compiler is assuming that the first argument to memcpy is double-aligned, and it is generating code that depends on that being true, and of course it isn't true and kaboom. You can *not* cast something to an aligned pointer type if it's not actually certain to be aligned suitably for that type. In this example, even if you wrote "(char *)" in front of this, it wouldn't save you; the compiler would still be entitled to believe that the intermediate cast value meant something. The casts in the underlying macros ITEM_FREQUENCY and so on are equally unsafe. (For the record, this is with gcc 4.2.1 on OpenBSD/hppa 6.4.) regards, tom lane
On Tue, Jun 25, 2019 at 11:52:28PM -0400, Tom Lane wrote: >I'm seeing a reproducible bus error here: > >#0 0x00417420 in statext_mcv_serialize (mcvlist=0x62223450, stats=Variable "stats" is not available. >) > at mcv.c:785 >785 memcpy(ITEM_BASE_FREQUENCY(item, ndims), &mcvitem->base_frequency, sizeof(double)); > >What appears to be happening is that since ITEM_BASE_FREQUENCY is defined as > >#define ITEM_BASE_FREQUENCY(item,ndims) ((double *) (ITEM_FREQUENCY(item, ndims) + 1)) > >the compiler is assuming that the first argument to memcpy is >double-aligned, and it is generating code that depends on that being >true, and of course it isn't true and kaboom. > >You can *not* cast something to an aligned pointer type if it's not >actually certain to be aligned suitably for that type. In this example, >even if you wrote "(char *)" in front of this, it wouldn't save you; >the compiler would still be entitled to believe that the intermediate >cast value meant something. The casts in the underlying macros >ITEM_FREQUENCY and so on are equally unsafe. > OK. So the solution is to ditch the casts altogether, and then do plain pointer arithmetics like this: #define ITEM_INDEXES(item) (item) #define ITEM_NULLS(item,ndims) (ITEM_INDEXES(item) + (ndims)) #define ITEM_FREQUENCY(item,ndims) (ITEM_NULLS(item, ndims) + (ndims)) #define ITEM_BASE_FREQUENCY(item,ndims) (ITEM_FREQUENCY(item, ndims) + sizeof(double)) Or is that still relying on alignment, somehow? regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Tomas Vondra <tomas.vondra@2ndquadrant.com> writes: > On Tue, Jun 25, 2019 at 11:52:28PM -0400, Tom Lane wrote: >> You can *not* cast something to an aligned pointer type if it's not >> actually certain to be aligned suitably for that type. > OK. So the solution is to ditch the casts altogether, and then do plain > pointer arithmetics like this: > #define ITEM_INDEXES(item) (item) > #define ITEM_NULLS(item,ndims) (ITEM_INDEXES(item) + (ndims)) > #define ITEM_FREQUENCY(item,ndims) (ITEM_NULLS(item, ndims) + (ndims)) > #define ITEM_BASE_FREQUENCY(item,ndims) (ITEM_FREQUENCY(item, ndims) + sizeof(double)) > Or is that still relying on alignment, somehow? No, constructs like a char* pointer plus n times sizeof(something) should be safe. regards, tom lane
On Wed, Jun 26, 2019 at 09:49:46AM +0200, Tomas Vondra wrote: >On Tue, Jun 25, 2019 at 11:52:28PM -0400, Tom Lane wrote: >>I'm seeing a reproducible bus error here: >> >>#0 0x00417420 in statext_mcv_serialize (mcvlist=0x62223450, stats=Variable "stats" is not available. >>) >> at mcv.c:785 >>785 memcpy(ITEM_BASE_FREQUENCY(item, ndims), &mcvitem->base_frequency, sizeof(double)); >> >>What appears to be happening is that since ITEM_BASE_FREQUENCY is defined as >> >>#define ITEM_BASE_FREQUENCY(item,ndims) ((double *) (ITEM_FREQUENCY(item, ndims) + 1)) >> >>the compiler is assuming that the first argument to memcpy is >>double-aligned, and it is generating code that depends on that being >>true, and of course it isn't true and kaboom. >> >>You can *not* cast something to an aligned pointer type if it's not >>actually certain to be aligned suitably for that type. In this example, >>even if you wrote "(char *)" in front of this, it wouldn't save you; >>the compiler would still be entitled to believe that the intermediate >>cast value meant something. The casts in the underlying macros >>ITEM_FREQUENCY and so on are equally unsafe. >> > >OK. So the solution is to ditch the casts altogether, and then do plain >pointer arithmetics like this: > >#define ITEM_INDEXES(item) (item) >#define ITEM_NULLS(item,ndims) (ITEM_INDEXES(item) + (ndims)) >#define ITEM_FREQUENCY(item,ndims) (ITEM_NULLS(item, ndims) + (ndims)) >#define ITEM_BASE_FREQUENCY(item,ndims) (ITEM_FREQUENCY(item, ndims) + sizeof(double)) > >Or is that still relying on alignment, somehow? > Attached is a patch that should (hopefully) fix this. It essentially treats the item as (char *) and does all pointer arithmetics without any additional casts. So there are no intermediate casts. I have no way to test this, so I may either wait for you to test this first, or push and wait. It seems to fail only on a very small number of buildfarm animals, so having a confirmation would be nice. The fix keeps the binary format as is, so the serialized MCV items are max-aligned. That means we can access the uint16 indexes directly, but we need to copy the rest of the fields (because those may not be aligned). In hindsight that seems a bit silly, we might as well copy everything, not care about the alignment and maybe save a few more bytes. But that would require catversion bump. OTOH we may beed to do that anyway, to fix the pg_mcv_list_items() signature (as discussed in the other MCV thread). regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Attachment
Tomas Vondra <tomas.vondra@2ndquadrant.com> writes: > Attached is a patch that should (hopefully) fix this. It essentially > treats the item as (char *) and does all pointer arithmetics without any > additional casts. So there are no intermediate casts. This passes the eyeball test, and it also allows my OpenBSD/hppa installation to get through the core regression tests, so I think it's good as far as it goes. Please push. However ... nosing around in mcv.c, I noticed that the next macro: /* * Used to compute size of serialized MCV list representation. */ #define MinSizeOfMCVList \ (VARHDRSZ + sizeof(uint32) * 3 + sizeof(AttrNumber)) #define SizeOfMCVList(ndims,nitems) \ (MAXALIGN(MinSizeOfMCVList + sizeof(Oid) * (ndims)) + \ MAXALIGN((ndims) * sizeof(DimensionInfo)) + \ MAXALIGN((nitems) * ITEM_SIZE(ndims))) is both woefully underdocumented and completely at variance with reality. It doesn't seem to be accounting for the actual data values. No doubt this is why it's not used in the places where it'd matter; the tests that do use it are testing much weaker conditions than they should. > The fix keeps the binary format as is, so the serialized MCV items are > max-aligned. That means we can access the uint16 indexes directly, but we > need to copy the rest of the fields (because those may not be aligned). In > hindsight that seems a bit silly, we might as well copy everything, not > care about the alignment and maybe save a few more bytes. I think that part of the problem here is that the way this code is written, "maxaligned" is no such thing. What you're actually maxaligning seems to be the offset from the start of the data area of a varlena value, which is generally going to be a maxaligned palloc result plus 4 bytes. So "aligned" double values are actually guaranteed to be on odd word boundaries not even ones. What's more, it's difficult to convince oneself that the maxaligns done in different parts of the code are all enforcing the same choices about which substructures get pseudo-maxaligned and which don't, because the logic doesn't line up very well. If we do need another catversion bump before v12, I'd vote for ripping out Every Single One of the "maxalign" operations in this code, just on the grounds of code simplicity and bug reduction. regards, tom lane
On Wed, Jun 26, 2019 at 11:26:21AM -0400, Tom Lane wrote: >Tomas Vondra <tomas.vondra@2ndquadrant.com> writes: >> Attached is a patch that should (hopefully) fix this. It essentially >> treats the item as (char *) and does all pointer arithmetics without any >> additional casts. So there are no intermediate casts. > >This passes the eyeball test, and it also allows my OpenBSD/hppa >installation to get through the core regression tests, so I think >it's good as far as it goes. Please push. > >However ... nosing around in mcv.c, I noticed that the next macro: > >/* > * Used to compute size of serialized MCV list representation. > */ >#define MinSizeOfMCVList \ > (VARHDRSZ + sizeof(uint32) * 3 + sizeof(AttrNumber)) > >#define SizeOfMCVList(ndims,nitems) \ > (MAXALIGN(MinSizeOfMCVList + sizeof(Oid) * (ndims)) + \ > MAXALIGN((ndims) * sizeof(DimensionInfo)) + \ > MAXALIGN((nitems) * ITEM_SIZE(ndims))) > >is both woefully underdocumented and completely at variance with >reality. It doesn't seem to be accounting for the actual data values. >No doubt this is why it's not used in the places where it'd matter; >the tests that do use it are testing much weaker conditions than they >should. > I agree about the macro being underdocumented, but AFAICS it's used correctly to check the expected length. It can't include the data values directly, because that's variable amount of data - and it's encoded in not yet verified part of the data. So this only includes parts with known lengths, and then the code does this: for (dim = 0; dim < ndims; dim++) { ... expected_size += MAXALIGN(info[dim].nbytes); } and uses that to check the actual length. if (VARSIZE_ANY(data) != expected_size) elog(ERROR, ...); That being said, maybe this is unnecessarily defensive and we should just trust the values not being corrupted. So if we get pg_mcv_list value, we'd simply assume it's OK. >> The fix keeps the binary format as is, so the serialized MCV items are >> max-aligned. That means we can access the uint16 indexes directly, but we >> need to copy the rest of the fields (because those may not be aligned). In >> hindsight that seems a bit silly, we might as well copy everything, not >> care about the alignment and maybe save a few more bytes. > >I think that part of the problem here is that the way this code is >written, "maxaligned" is no such thing. What you're actually maxaligning >seems to be the offset from the start of the data area of a varlena value, >which is generally going to be a maxaligned palloc result plus 4 bytes. >So "aligned" double values are actually guaranteed to be on odd word >boundaries not even ones. > I don't think so. The pointers should be maxaligned with respect to the whole varlena value, which is what 'raw' points to. At least that was the intent of code like this: raw = palloc0(total_length); ... /* the header may not be exactly aligned, so make sure it is */ ptr = raw + MAXALIGN(ptr - raw); If it's not like that in some place, it's a bug. >What's more, it's difficult to convince oneself that the maxaligns done >in different parts of the code are all enforcing the same choices about >which substructures get pseudo-maxaligned and which don't, because the >logic doesn't line up very well. > Not sure. If there's a way to make it clearer, I'm ready to do the work. Unfortunately it's hard for me to judge that, because I've spent so much time on that code that it seems fairly clear to me. >If we do need another catversion bump before v12, I'd vote for ripping >out Every Single One of the "maxalign" operations in this code, just >on the grounds of code simplicity and bug reduction. > Hmmm, OK. The original reason to keep the parts aligned was to be able to reference the parts directly during processing. If we get rid of the alignment, we'll have to memcpy everything during deserialization. But if it makes the code simpler, it might be worth it - this part of the code was clearly the weakest part of the patch. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Tomas Vondra <tomas.vondra@2ndquadrant.com> writes: > On Wed, Jun 26, 2019 at 11:26:21AM -0400, Tom Lane wrote: >> #define SizeOfMCVList(ndims,nitems) \ >> is both woefully underdocumented and completely at variance with >> reality. It doesn't seem to be accounting for the actual data values. > I agree about the macro being underdocumented, but AFAICS it's used > correctly to check the expected length. It can't include the data values > directly, because that's variable amount of data - and it's encoded in not > yet verified part of the data. Well, it should have some other name then. Or *at least* a comment. It's unbelievably misleading as it stands. > That being said, maybe this is unnecessarily defensive and we should just > trust the values not being corrupted. No, I'm on board with checking the lengths. I just don't like how hard it is to discern what's being checked. >> I think that part of the problem here is that the way this code is >> written, "maxaligned" is no such thing. What you're actually maxaligning >> seems to be the offset from the start of the data area of a varlena value, > I don't think so. The pointers should be maxaligned with respect to the > whole varlena value, which is what 'raw' points to. [ squint ... ] OK, I think I misread this: statext_mcv_deserialize(bytea *data) { ... /* pointer to the data part (skip the varlena header) */ ptr = VARDATA_ANY(data); raw = (char *) data; I think this is confusing in itself --- I read it as "raw = (char *) ptr" and I think most other people would assume that too based on the order of operations. It'd read better as /* remember start of datum for maxalign reference */ raw = (char *) data; /* pointer to the data part (skip the varlena header) */ ptr = VARDATA_ANY(data); Another problem with this code is that it flat doesn't work for non-4-byte-header varlenas: it'd do the alignment differently than the serialization side did. That's okay given that the two extant call sites are guaranteed to pass detoasted datums. But using VARDATA_ANY gives a completely misleading appearance of being ready to deal with short-header varlenas, and heaven forbid there should be any comment to discourage future coders from trying. So really what I'd like to see here is /* remember start of datum for maxalign reference */ raw = (char *) data; /* alignment logic assumes full-size datum header */ Assert(VARATT_IS_4B(data)); /* pointer to the data part (skip the varlena header) */ ptr = VARDATA_ANY(data); Or, of course, this could all go away if we got rid of the bogus maxaligning... regards, tom lane
On Wed, Jun 26, 2019 at 12:31:13PM -0400, Tom Lane wrote: >Tomas Vondra <tomas.vondra@2ndquadrant.com> writes: >> On Wed, Jun 26, 2019 at 11:26:21AM -0400, Tom Lane wrote: >>> #define SizeOfMCVList(ndims,nitems) \ >>> is both woefully underdocumented and completely at variance with >>> reality. It doesn't seem to be accounting for the actual data values. > >> I agree about the macro being underdocumented, but AFAICS it's used >> correctly to check the expected length. It can't include the data values >> directly, because that's variable amount of data - and it's encoded in not >> yet verified part of the data. > >Well, it should have some other name then. Or *at least* a comment. >It's unbelievably misleading as it stands. > True. >> That being said, maybe this is unnecessarily defensive and we should just >> trust the values not being corrupted. > >No, I'm on board with checking the lengths. I just don't like how >hard it is to discern what's being checked. > Understood. >>> I think that part of the problem here is that the way this code is >>> written, "maxaligned" is no such thing. What you're actually maxaligning >>> seems to be the offset from the start of the data area of a varlena value, > >> I don't think so. The pointers should be maxaligned with respect to the >> whole varlena value, which is what 'raw' points to. > >[ squint ... ] OK, I think I misread this: > >statext_mcv_deserialize(bytea *data) >{ >... > /* pointer to the data part (skip the varlena header) */ > ptr = VARDATA_ANY(data); > raw = (char *) data; > >I think this is confusing in itself --- I read it as "raw = (char *) ptr" >and I think most other people would assume that too based on the order >of operations. It'd read better as > > /* remember start of datum for maxalign reference */ > raw = (char *) data; > > /* pointer to the data part (skip the varlena header) */ > ptr = VARDATA_ANY(data); > Yeah, that'd have been better. >Another problem with this code is that it flat doesn't work for >non-4-byte-header varlenas: it'd do the alignment differently than the >serialization side did. That's okay given that the two extant call sites >are guaranteed to pass detoasted datums. But using VARDATA_ANY gives a >completely misleading appearance of being ready to deal with short-header >varlenas, and heaven forbid there should be any comment to discourage >future coders from trying. So really what I'd like to see here is > > /* remember start of datum for maxalign reference */ > raw = (char *) data; > > /* alignment logic assumes full-size datum header */ > Assert(VARATT_IS_4B(data)); > > /* pointer to the data part (skip the varlena header) */ > ptr = VARDATA_ANY(data); > >Or, of course, this could all go away if we got rid of the >bogus maxaligning... > OK. Attached is a patch ditching the alignment in serialized data. I've ditched the macros to access parts of serialized data, and everything gets copied. The main complication is with varlena values, which may or may not have 4B headers (for now there's the PG_DETOAST_DATUM call, but as you mentioned we may want to remove it in the future). So I've stored the length as uint32 separately, followed by the full varlena value (thanks to that the deserialization is simpler). Not sure if that's the best solution, though, because this way we store the length twice. I've kept the alignment in the deserialization code, because there it allows us to allocate the whole value as a single chunk, which I think is useful (I admit I don't have any measurements to demonstrate that). But if we decide to rework this later, we can - it's just in-memory representation, not on-disk. Is this roughly what you had in mind? FWIW I'm sure some of the comments are stale and/or need clarification, but it's a bit too late over here, so I'll look into that tomorrow. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Attachment
Tomas Vondra <tomas.vondra@2ndquadrant.com> writes: > OK. Attached is a patch ditching the alignment in serialized data. I've > ditched the macros to access parts of serialized data, and everything > gets copied. I lack energy to actually read this patch right now, and I don't currently have an opinion about whether it's worth another catversion bump to fix this stuff in v12. But I did test the patch, and I can confirm it gets through the core regression tests on hppa (both gaur's host environment with gcc 3.4.6, and the OpenBSD installation with gcc 4.2.1). regards, tom lane
On Thu, Jun 27, 2019 at 12:04:30AM -0400, Tom Lane wrote: >Tomas Vondra <tomas.vondra@2ndquadrant.com> writes: >> OK. Attached is a patch ditching the alignment in serialized data. I've >> ditched the macros to access parts of serialized data, and everything >> gets copied. > >I lack energy to actually read this patch right now, and I don't currently >have an opinion about whether it's worth another catversion bump to fix >this stuff in v12. But I did test the patch, and I can confirm it gets >through the core regression tests on hppa (both gaur's host environment >with gcc 3.4.6, and the OpenBSD installation with gcc 4.2.1). > Thanks for running it through regression tests, that alone is a very useful piece of information for me. As for the catversion bump - I'd probably vote to do it. Not just because of this serialization stuff, but to fix the pg_mcv_list_items function. It's not something I'm very enthusiastic about (kinda embarassed about it, really), but it seems better than shipping something that we'll need to rework in PG13. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Thu, Jun 27, 2019 at 01:26:32PM +0200, Tomas Vondra wrote: >On Thu, Jun 27, 2019 at 12:04:30AM -0400, Tom Lane wrote: >>Tomas Vondra <tomas.vondra@2ndquadrant.com> writes: >>>OK. Attached is a patch ditching the alignment in serialized data. I've >>>ditched the macros to access parts of serialized data, and everything >>>gets copied. >> >>I lack energy to actually read this patch right now, and I don't currently >>have an opinion about whether it's worth another catversion bump to fix >>this stuff in v12. But I did test the patch, and I can confirm it gets >>through the core regression tests on hppa (both gaur's host environment >>with gcc 3.4.6, and the OpenBSD installation with gcc 4.2.1). >> > >Thanks for running it through regression tests, that alone is a very >useful piece of information for me. > >As for the catversion bump - I'd probably vote to do it. Not just because >of this serialization stuff, but to fix the pg_mcv_list_items function. >It's not something I'm very enthusiastic about (kinda embarassed about it, >really), but it seems better than shipping something that we'll need to >rework in PG13. > Attached is a slightly improved version of the serialization patch. The main difference is that when serializing varlena values, the previous patch version stored length (uint32) + full varlena (incl. the header) which is kinda redundant, because the varlena stores the length too. So now it only stores the length + data, without the varlena header. I don't think there's a better way to store varlena values without enforcing alignment (which is what happens in current master). There's one additional change I failed to mention before - I had to add another field to DimensionInfo, tracking how much space will be needed for deserialized data. This is needed because the deserialization allocates the whole MCV as a single chunk of memory, to reduce palloc overhead. It could parse the data twice (first to determine the space, then to actually parse it), this allows doing just a single pass. Which seems useful for large MCV lists, but maybe it's not worth it? Barring objections I'll commit this together with the pg_mcv_list_items fix, posted in a separate thread. Of course, this requires catversion bump - an alternative would be to keep enforcing the alignment, but tweak the macros to work on all platforms without SIGBUS. Considering how troublesome this serialiation part of the patch turner out to be, I'm not really sure by anything at this point. So I'd welcome thoughts about the proposed changes. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Attachment
Tomas Vondra <tomas.vondra@2ndquadrant.com> writes: > Attached is a slightly improved version of the serialization patch. I reviewed this patch, and tested it on hppa and ppc. I found one serious bug: in the deserialization varlena case, you need - dataptr += MAXALIGN(len); + dataptr += MAXALIGN(len + VARHDRSZ); (approx. line 1100 in mcv.c). Without this, the output data is corrupt, plus the Assert a few lines further down about dataptr having been advanced by the correct amount fires. (On one machine I tested on, that happened during the core regression tests. The other machine got through regression, but trying to do "select * from pg_stats_ext;" afterwards exhibited the crash. I didn't investigate closely, but I suspect the difference has to do with different MAXALIGN values, 4 and 8 respectively.) The attached patch (a delta atop your v2) corrects that plus some cosmetic issues. If we're going to push this, it would be considerably less complicated to do so before v12 gets branched --- not long after that, there will be catversion differences to cope with. I'm planning to make the branch tomorrow (Monday), probably ~1500 UTC. Just sayin'. regards, tom lane diff --git a/src/backend/statistics/mcv.c b/src/backend/statistics/mcv.c index 256728a..cfe7e54 100644 --- a/src/backend/statistics/mcv.c +++ b/src/backend/statistics/mcv.c @@ -40,13 +40,14 @@ * stored in a separate array (deduplicated, to minimize the size), and * so the serialized items only store uint16 indexes into that array. * - * Each serialized item store (in this order): + * Each serialized item stores (in this order): * * - indexes to values (ndim * sizeof(uint16)) * - null flags (ndim * sizeof(bool)) * - frequency (sizeof(double)) * - base_frequency (sizeof(double)) * + * There is no alignment padding within an MCV item. * So in total each MCV item requires this many bytes: * * ndim * (sizeof(uint16) + sizeof(bool)) + 2 * sizeof(double) @@ -61,7 +62,7 @@ (VARHDRSZ + sizeof(uint32) * 3 + sizeof(AttrNumber)) /* - * Size of the serialized MCV list, exluding the space needed for + * Size of the serialized MCV list, excluding the space needed for * deduplicated per-dimension values. The macro is meant to be used * when it's not yet safe to access the serialized info about amount * of data for each column. @@ -619,6 +620,7 @@ statext_mcv_serialize(MCVList *mcvlist, VacAttrStats **stats) else if (info[dim].typlen == -1) /* varlena */ { info[dim].nbytes = 0; + info[dim].nbytes_aligned = 0; for (i = 0; i < info[dim].nvalues; i++) { Size len; @@ -646,6 +648,7 @@ statext_mcv_serialize(MCVList *mcvlist, VacAttrStats **stats) else if (info[dim].typlen == -2) /* cstring */ { info[dim].nbytes = 0; + info[dim].nbytes_aligned = 0; for (i = 0; i < info[dim].nvalues; i++) { Size len; @@ -743,7 +746,7 @@ statext_mcv_serialize(MCVList *mcvlist, VacAttrStats **stats) * assumes little endian behavior. store_att_byval does * almost what we need, but it requires properly aligned * buffer - the output buffer does not guarantee that. So we - * simply use a static Datum variable (which guarantees proper + * simply use a local Datum variable (which guarantees proper * alignment), and then copy the value from it. */ store_att_byval(&tmp, value, info[dim].typlen); @@ -759,14 +762,14 @@ statext_mcv_serialize(MCVList *mcvlist, VacAttrStats **stats) } else if (info[dim].typlen == -1) /* varlena */ { - uint32 len = VARSIZE_ANY_EXHDR(value); + uint32 len = VARSIZE_ANY_EXHDR(DatumGetPointer(value)); /* copy the length */ memcpy(ptr, &len, sizeof(uint32)); ptr += sizeof(uint32); /* data from the varlena value (without the header) */ - memcpy(ptr, VARDATA(DatumGetPointer(value)), len); + memcpy(ptr, VARDATA_ANY(DatumGetPointer(value)), len); ptr += len; } else if (info[dim].typlen == -2) /* cstring */ @@ -1100,7 +1103,7 @@ statext_mcv_deserialize(bytea *data) map[dim][i] = PointerGetDatum(dataptr); /* skip to place of the next deserialized value */ - dataptr += MAXALIGN(len); + dataptr += MAXALIGN(len + VARHDRSZ); } } else if (info[dim].typlen == -2) diff --git a/src/include/statistics/extended_stats_internal.h b/src/include/statistics/extended_stats_internal.h index 6778746..8fc5419 100644 --- a/src/include/statistics/extended_stats_internal.h +++ b/src/include/statistics/extended_stats_internal.h @@ -36,7 +36,7 @@ typedef struct DimensionInfo { int nvalues; /* number of deduplicated values */ int nbytes; /* number of bytes (serialized) */ - int nbytes_aligned; /* deserialized data with alignment */ + int nbytes_aligned; /* size of deserialized data with alignment */ int typlen; /* pg_type.typlen */ bool typbyval; /* pg_type.typbyval */ } DimensionInfo;
On Sun, Jun 30, 2019 at 08:30:33PM -0400, Tom Lane wrote: >Tomas Vondra <tomas.vondra@2ndquadrant.com> writes: >> Attached is a slightly improved version of the serialization patch. > >I reviewed this patch, and tested it on hppa and ppc. I found one >serious bug: in the deserialization varlena case, you need > >- dataptr += MAXALIGN(len); >+ dataptr += MAXALIGN(len + VARHDRSZ); > >(approx. line 1100 in mcv.c). Without this, the output data is corrupt, >plus the Assert a few lines further down about dataptr having been >advanced by the correct amount fires. (On one machine I tested on, >that happened during the core regression tests. The other machine >got through regression, but trying to do "select * from pg_stats_ext;" >afterwards exhibited the crash. I didn't investigate closely, but >I suspect the difference has to do with different MAXALIGN values, >4 and 8 respectively.) > >The attached patch (a delta atop your v2) corrects that plus some >cosmetic issues. > Thanks. >If we're going to push this, it would be considerably less complicated >to do so before v12 gets branched --- not long after that, there will be >catversion differences to cope with. I'm planning to make the branch >tomorrow (Monday), probably ~1500 UTC. Just sayin'. > Unfortunately, I was travelling on Sunday and was quite busy on Monday, so I've been unable to push this before the branching :-( I'll push by the end of this week, once I get home. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Tue, Jul 02, 2019 at 10:38:29AM +0200, Tomas Vondra wrote: >On Sun, Jun 30, 2019 at 08:30:33PM -0400, Tom Lane wrote: >>Tomas Vondra <tomas.vondra@2ndquadrant.com> writes: >>>Attached is a slightly improved version of the serialization patch. >> >>I reviewed this patch, and tested it on hppa and ppc. I found one >>serious bug: in the deserialization varlena case, you need >> >>- dataptr += MAXALIGN(len); >>+ dataptr += MAXALIGN(len + VARHDRSZ); >> >>(approx. line 1100 in mcv.c). Without this, the output data is corrupt, >>plus the Assert a few lines further down about dataptr having been >>advanced by the correct amount fires. (On one machine I tested on, >>that happened during the core regression tests. The other machine >>got through regression, but trying to do "select * from pg_stats_ext;" >>afterwards exhibited the crash. I didn't investigate closely, but >>I suspect the difference has to do with different MAXALIGN values, >>4 and 8 respectively.) >> >>The attached patch (a delta atop your v2) corrects that plus some >>cosmetic issues. >> > >Thanks. > >>If we're going to push this, it would be considerably less complicated >>to do so before v12 gets branched --- not long after that, there will be >>catversion differences to cope with. I'm planning to make the branch >>tomorrow (Monday), probably ~1500 UTC. Just sayin'. >> > >Unfortunately, I was travelling on Sunday and was quite busy on Monday, so >I've been unable to push this before the branching :-( > >I'll push by the end of this week, once I get home. > I've pushed the fix (along with the pg_mcv_list_item fix) into master, hopefully the buildfarm won't be upset about it. I was about to push into REL_12_STABLE, when I realized that maybe we need to do something about the catversion first. REL_12_STABLE is still on 201906161, while master got to 201907041 thanks to commit 7b925e12703. Simply cherry-picking the commits would get us to 201907052 in both branches, but that'd be wrong as the catalogs do differ. I suppose this is what you meant by "catversion differences to cope with". I suppose this is not the first time this happened - how did we deal with it in the past? I guess we could use some "past" non-conflicting catversion number in the REL_12_STABLE branch (say, 201907030x) but maybe that'd be wrong? regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Tomas Vondra <tomas.vondra@2ndquadrant.com> writes: > I was about to push into REL_12_STABLE, when I realized that maybe we > need to do something about the catversion first. REL_12_STABLE is still > on 201906161, while master got to 201907041 thanks to commit > 7b925e12703. Simply cherry-picking the commits would get us to > 201907052 in both branches, but that'd be wrong as the catalogs do > differ. I suppose this is what you meant by "catversion differences to > cope with". Yeah, exactly. My recommendation is to use 201907051 on v12 and 201907052 on master (or whatever is $today for you). They need to be different now that the branches' catalog histories have diverged, and it seems to me that the back branch should be "older". regards, tom lane
On Fri, Jul 5, 2019, 03:28 Tom Lane <tgl@sss.pgh.pa.us> wrote:
Tomas Vondra <tomas.vondra@2ndquadrant.com> writes:
> I was about to push into REL_12_STABLE, when I realized that maybe we
> need to do something about the catversion first. REL_12_STABLE is still
> on 201906161, while master got to 201907041 thanks to commit
> 7b925e12703. Simply cherry-picking the commits would get us to
> 201907052 in both branches, but that'd be wrong as the catalogs do
> differ. I suppose this is what you meant by "catversion differences to
> cope with".
Yeah, exactly.
My recommendation is to use 201907051 on v12 and 201907052
on master (or whatever is $today for you). They need to be
different now that the branches' catalog histories have diverged,
and it seems to me that the back branch should be "older".
regards, tom lane
Unfortunately, master is already using both 201907051 and 201907052 (two of the patches I pushed touched the catalog), so we can't quite do exactly that. We need to use 201907042 and 201907043 or something preceding 201907041 (which is the extra catversion on master).
At this point there's no perfect sequence, thanks to the extra commit on master, so REL_12_STABLE can't be exactly "older" :-(
Barring objections, I'll go ahead with 201907042+201907043 later today, before someone pushes another catversion-bumping patch.
regards
On Fri, Jul 05, 2019 at 10:36:59AM +0200, Tomas Vondra wrote: >On Fri, Jul 5, 2019, 03:28 Tom Lane <tgl@sss.pgh.pa.us> wrote: > >> Tomas Vondra <tomas.vondra@2ndquadrant.com> writes: >> > I was about to push into REL_12_STABLE, when I realized that maybe we >> > need to do something about the catversion first. REL_12_STABLE is still >> > on 201906161, while master got to 201907041 thanks to commit >> > 7b925e12703. Simply cherry-picking the commits would get us to >> > 201907052 in both branches, but that'd be wrong as the catalogs do >> > differ. I suppose this is what you meant by "catversion differences to >> > cope with". >> >> Yeah, exactly. >> >> My recommendation is to use 201907051 on v12 and 201907052 >> on master (or whatever is $today for you). They need to be >> different now that the branches' catalog histories have diverged, >> and it seems to me that the back branch should be "older". >> >> regards, tom lane >> > >Unfortunately, master is already using both 201907051 and 201907052 (two of >the patches I pushed touched the catalog), so we can't quite do exactly >that. We need to use 201907042 and 201907043 or something preceding 201907041 >(which is the extra catversion on master). > >At this point there's no perfect sequence, thanks to the extra commit on >master, so REL_12_STABLE can't be exactly "older" :-( > >Barring objections, I'll go ahead with 201907042+201907043 later today, >before someone pushes another catversion-bumping patch. > I've pushed the REL_12_STABLE backpatches too, now. I've ended up using 201907031 and 201907032 - those values precede the first catversion bump in master (201907041), so the back branch looks "older". And there's a bit of slack for additional bumps (if the unlikely case we need them). We might have "fixed" this by backpatching the commit with the extra catversion bump (7b925e12) but the commit seems a bit too large for that. It's fairly isolated though. But it seems like a bad practice. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Tomas Vondra <tomas.vondra@2ndquadrant.com> writes: > I've pushed the REL_12_STABLE backpatches too, now. I've ended up using > 201907031 and 201907032 - those values precede the first catversion bump > in master (201907041), so the back branch looks "older". And there's a > bit of slack for additional bumps (if the unlikely case we need them). FWIW, I don't think there's a need for every catversion on the back branch to look older than any catversion on HEAD. The requirement so far as the core code is concerned is only for non-equality. Now, extension code does often do something like "if catversion >= xxx", but in practice they're only concerned about numbers used by released versions. HEAD's catversion will be strictly greater than v12's soon enough, even if you had made it not so today. So I think sticking to today's-date-with-some-N is better than artificially assigning other dates. What's done is done, and there's no need to change it, but now you know what to do next time. > We might have "fixed" this by backpatching the commit with the extra > catversion bump (7b925e12) but the commit seems a bit too large for > that. It's fairly isolated though. But it seems like a bad practice. Yeah, that approach flies in the face of the notion of feature freeze. regards, tom lane
On 2019-Jul-05, Tom Lane wrote: > FWIW, I don't think there's a need for every catversion on the back branch > to look older than any catversion on HEAD. The requirement so far as the > core code is concerned is only for non-equality. Now, extension code does > often do something like "if catversion >= xxx", but in practice they're > only concerned about numbers used by released versions. pg_upgrade also uses >= catversion comparison for a couple of things. I don't think it affects this case, but it's worth keeping in mind. -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services