Thread: Missing update of all_hasnulls in BRIN opclasses
Hi, While working on some BRIN code, I discovered a bug in handling NULL values - when inserting a non-NULL value into a NULL-only range, we reset the all_nulls flag but don't update the has_nulls flag. And because of that we then fail to return the range for IS NULL ranges. Reproducing this is trivial: create table t (a int); create index on t using brin (a); insert into t values (null); insert into t values (1); set enable_seqscan = off; select * from t where a is null; This should return 1 row, but actually it returns no rows. Attached is a patch fixing this by properly updating the has_nulls flag. I reproduced this all the way back to 9.5, so it's a long-standing bug. It's interesting no one noticed / reported it so far, it doesn't seem like a particularly rare corner case. regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Attachment
On Fri, 21 Oct 2022 at 17:24, Tomas Vondra <tomas.vondra@enterprisedb.com> wrote: > > Hi, > > While working on some BRIN code, I discovered a bug in handling NULL > values - when inserting a non-NULL value into a NULL-only range, we > reset the all_nulls flag but don't update the has_nulls flag. And > because of that we then fail to return the range for IS NULL ranges. Ah, that's bad. One question though: doesn't (shouldn't?) column->bv_allnulls already imply column->bv_hasnulls? The column has nulls if all of the values are null, right? Or is the description of the field deceptive, and does bv_hasnulls actually mean "has nulls bitmap"? > Attached is a patch fixing this by properly updating the has_nulls flag. One comment on the patch: > +SET enable_seqscan = off; > + [...] > +SET enable_seqscan = off; Looks like duplicated SETs. Should that last one be RESET instead? Apart from that, this patch looks good. - Matthias
On 10/21/22 17:50, Matthias van de Meent wrote: > On Fri, 21 Oct 2022 at 17:24, Tomas Vondra > <tomas.vondra@enterprisedb.com> wrote: >> >> Hi, >> >> While working on some BRIN code, I discovered a bug in handling NULL >> values - when inserting a non-NULL value into a NULL-only range, we >> reset the all_nulls flag but don't update the has_nulls flag. And >> because of that we then fail to return the range for IS NULL ranges. > > Ah, that's bad. > Yeah, I guess we'll need to inform the users to consider rebuilding BRIN indexes on NULL-able columns. > One question though: doesn't (shouldn't?) column->bv_allnulls already > imply column->bv_hasnulls? The column has nulls if all of the values > are null, right? Or is the description of the field deceptive, and > does bv_hasnulls actually mean "has nulls bitmap"? > What null bitmap do you mean? We're talking about summary for a page range - IIRC we translate this to nullbitmap for a BRIN tuple, but there may be multiple columns, and "has nulls bitmap" is an aggregate over all of them. Yeah, maybe it'd make sense to also have has_nulls=true whenever all_nulls=true, and maybe it'd be simpler because it'd be enough to check just one flag in consistent function etc. But we still need to track 2 different states - "has nulls" and "has summary". In any case, this ship sailed long ago - at least for the existing opclasses. >> Attached is a patch fixing this by properly updating the has_nulls flag. > > One comment on the patch: > >> +SET enable_seqscan = off; >> + [...] >> +SET enable_seqscan = off; > > Looks like duplicated SETs. Should that last one be RESET instead? > Yeah, should have been RESET. > Apart from that, this patch looks good. > Thanks! -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 10/21/22 18:44, Tomas Vondra wrote: > > ... > >> Apart from that, this patch looks good. >> Sadly, I don't think we can fix it like this :-( The problem is that all ranges start with all_nulls=true, because the new range gets initialized by brin_memtuple_initialize() like that. But this happens for *every* range before we even start processing the rows. So this way all the ranges would end up with has_nulls=true, making that flag pretty useless. Actually, even just doing "truncate" on the table creates such all-nulls range for the first range, and serializes it to disk. I wondered why we even write such tuples for "empty" ranges to disk, for example after "TRUNCATE" - the table is empty by definition, so how come we write all-nulls brin summary for the first range? For example brininsert() checks if the brin tuple was modified and needs to be written back, but brinbuild() just ignores that, and initializes (and writes) writes the tuple to disk anyway. I think we should not do that - there should be a flag in BrinBuildState, tracking if the BRIN tuple was modified, and we should only write it if it's true. That means we should never get an on-disk summary representing nothing. That doesn't fix the issue, though, because we still need to pass the memtuple tuple to the add_value opclass procedure, and whether it sets the has_nulls flag depends on whether it's a new tuple representing no other rows (in which case has_nulls remains false) or whether it was read from disk (in which case it needs to be flipped to 'true'). But the opclass has no way to tell the difference at the moment - it just gets the BrinMemTuple. So we'd have to extend this, somehow. I wonder how to do this in a back-patchable way - we can't add parameters to the opclass procedure, and the other solution seems to be storing it right in the BrinMemTuple, somehow. But that's likely an ABI break :-( The only solution I can think of is actually passing it using all_nulls and has_nulls - we could set both flags to true (which we never do now) and teach the opclass that it signifies "empty" (and thus not to update has_nulls after resetting all_nulls). Something like the attached (I haven't added any more tests, not sure what would those look like - I can't think of a query testing this, although maybe we could check how the flags change using pageinspect). regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Attachment
On 2022-Oct-22, Tomas Vondra wrote: > I wonder how to do this in a back-patchable way - we can't add > parameters to the opclass procedure, and the other solution seems to be > storing it right in the BrinMemTuple, somehow. But that's likely an ABI > break :-( Hmm, I don't see the ABI incompatibility. BrinMemTuple is an in-memory structure, so you can add new members at the end of the struct and it will pose no problems to existing code. > The only solution I can think of is actually passing it using all_nulls > and has_nulls - we could set both flags to true (which we never do now) > and teach the opclass that it signifies "empty" (and thus not to update > has_nulls after resetting all_nulls). > > Something like the attached (I haven't added any more tests, not sure > what would those look like - I can't think of a query testing this, > although maybe we could check how the flags change using pageinspect). I'll try to have a look at these patches tomorrow or on Monday. -- Álvaro Herrera 48°01'N 7°57'E — https://www.EnterpriseDB.com/ "I suspect most samba developers are already technically insane... Of course, since many of them are Australians, you can't tell." (L. Torvalds)
On 10/22/22 10:00, Alvaro Herrera wrote: > On 2022-Oct-22, Tomas Vondra wrote: > >> I wonder how to do this in a back-patchable way - we can't add >> parameters to the opclass procedure, and the other solution seems to be >> storing it right in the BrinMemTuple, somehow. But that's likely an ABI >> break :-( > > Hmm, I don't see the ABI incompatibility. BrinMemTuple is an in-memory > structure, so you can add new members at the end of the struct and it > will pose no problems to existing code. > But we're not passing BrinMemTuple to the opclass procedures - we're passing a pointer to BrinValues, so we'd have to add the flag there. And we're storing an array of those, so adding a field may shift the array even if you add it at the end. Not sure if that's OK or not. >> The only solution I can think of is actually passing it using all_nulls >> and has_nulls - we could set both flags to true (which we never do now) >> and teach the opclass that it signifies "empty" (and thus not to update >> has_nulls after resetting all_nulls). >> >> Something like the attached (I haven't added any more tests, not sure >> what would those look like - I can't think of a query testing this, >> although maybe we could check how the flags change using pageinspect). > > I'll try to have a look at these patches tomorrow or on Monday. > I was experimenting with this a bit more, and unfortunately the latest patch is still a few bricks shy - it did fix this particular issue, but there were other cases that remained/got broken. See the first patch, that adds a bunch of pageinspect tests testing different combinations. After thinking about it a bit more, I think we can't quite fix this at the opclass level, so the yesterday's patches are wrong. Instead, this should be fixed in values_add_to_range() - the whole trick is we need to remember the range was empty at the beginning, and only set the flag when allnulls is false. The reworked patch does that. And we can use the same logic (both flags set mean no tuples were added to the range) when building the index, a separate flag is not needed. This slightly affects existing regression tests, because we won't create any ranges for empty table (now we created one, because we initialized a tuple in brinbuild and then wrote it to disk). This means that brin_summarize_range now returns 0, but I think that's fine. regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Attachment
Hi, Tomas:
For 0002-fixup-brin-has_nulls-20221022.patch :
+ first_row = (bval->bv_hasnulls && bval->bv_allnulls);
+
+ if (bval->bv_hasnulls && bval->bv_allnulls)
+
+ if (bval->bv_hasnulls && bval->bv_allnulls)
It seems the if condition can be changed to `if (first_row)` which is more readable.
Chhers
Here's an improved version of the fix I posted about a month ago. 0001 Adds tests demonstrating the issue, as before. I realized there's an isolation test in src/test/module/brin that can demonstrate this, so I modified it too, not just the pageinspect test as before. 0002 Uses the combination of all_nulls/has_nulls to identify "empty" range, and does not store them to disk. I however realized not storing "empty" ranges is probably not desirable. Imagine a table with a "gap" (e.g. due to a batch DELETE) of pages with no rows: create table x (a int) with (fillfactor = 10); insert into x select i from generate_series(1,1000) s(i); delete from x where a < 1000; create index on x using brin(a) with (pages_per_range=1); Any bitmap index scan using this index would have to scan all those empty ranges, because there are no summaries. 0003 Still uses the all_nulls/has_nulls flags to identify empty ranges, but stores them - and then we check the combination in bringetbitmap() to skip those ranges as not matching any scan keys. This also restores some of the existing behavior - for example creating a BRIN index on entirely empty table (no pages at all) still allocates a 48kB index (3 index pages, 3 fsm pages). Seems a bit strange, but it's an existing behavior. As explained before, I've considered adding an new flag to one of the BRIN structs - BrinMemTuple or BrinValues. But we can't add as last field to BrinMemTuple because there already is FLEXIBLE_ARRAY_MEMBER, and adding a field to BrinValues would change stride of the bt_columns array. So this would break ABI, making this not backpatchable. Furthermore, if we want to store summaries for empty ranges (which is what 0003 does), we need to store the flag in the BRIN index tuple. And we can't change the on-disk representation in backbranches, so encoding this in the existing tuple seems like the only way. So using the combination of all_nulls/has_nulls flag seems like the only viable option, unfortunately. Opinions? Considering this will need to be backpatches, it'd be good to get some feedback on the approach. I think it's fine, but it would be unfortunate to fix one issue but break BRIN in a different way. regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Attachment
On Mon, Nov 28, 2022 at 01:13:14AM +0100, Tomas Vondra wrote: > Opinions? Considering this will need to be backpatches, it'd be good to > get some feedback on the approach. I think it's fine, but it would be > unfortunate to fix one issue but break BRIN in a different way. > --- a/contrib/pageinspect/Makefile > +++ b/contrib/pageinspect/Makefile > @@ -22,7 +22,7 @@ DATA = pageinspect--1.10--1.11.sql \ > pageinspect--1.0--1.1.sql > PGFILEDESC = "pageinspect - functions to inspect contents of database pages" > > -REGRESS = page btree brin gin gist hash checksum oldextversions > +REGRESS = page btree brin gin gist hash checksum oldextversions brinbugs I can't comment on the patch itself, but: These changes to ./Makefile will also need to be made in ./meson.build. Also (per cirrusci), the test sometimes fail since two parallel tests are doing "CREATE EXTENSION".
Hi, here's an improved and cleaned-up version of the fix. I removed brinbugs.sql from pageinspect, because it seems enough to have the other tests (I added brinbugs first, before realizing those exist). This also means meson.build is fine and there are no tests doing CREATE EXTENSION concurrently etc. I decided to go with the 0003 approach, which stores summaries for empty ranges. That seems to be less intrusive (it's more like what we do now), and works better for tables with a lot of bulk deletes. It means we can have ranges with allnulls=hasnulls=true, which wasn't the case before, but I don't see why this should break e.g. custom opclasses (if it does, it probably means the opclass is wrong). Finally, I realized union_tuples needs to be tweaked to deal with empty ranges properly. The changes are fairly limited, though. I plan to push this into master right at the beginning of January, and then backpatch a couple days later. I still feel a bit uneasy about tweaking this, but I don't think there's a better way than reusing the existing flags. regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Attachment
On Fri, Dec 30, 2022 at 01:18:36AM +0100, Tomas Vondra wrote: > + * Does the range already has NULL values? Either of the flags can should say: "already have NULL values" > + * If we had NULLS, and the opclass didn't set allnulls=true, set > + * the hasnulls so that we know there are NULL values. You could remove "the" before "hasnulls". Or say "clear hasnulls so that.." > @@ -585,6 +587,13 @@ brin_deform_tuple(BrinDesc *brdesc, BrinTuple *tuple, BrinMemTuple *dMemtuple) > { > int i; > > + /* > + * Make sure to overwrite the hasnulls flag, because it was initialized > + * to true by brin_memtuple_initialize and we don't want to skip it if > + * allnulls. Does "if allnulls" mean "if allnulls is true" ? It's a bit unclear. > + */ > + dtup->bt_columns[keyno].bv_hasnulls = hasnulls[keyno]; > + > if (allnulls[keyno]) > { > valueno += brdesc->bd_info[keyno]->oi_nstored; -- Justin
Thanks Justin! I've applied all the fixes you proposed, and (hopefully) improved a couple other comments. I've been working on this over the past couple days, trying to polish and commit it over the weekend - both into master and backbranches. Sadly, the backpatching part turned out to be a bit more complicated than I expected, because of the BRIN reworks in PG14 (done by me, as foundation for the new opclasses, so ... well). Anyway, I got it done, but it's a bit uglier than I hoped for and I don't feel like pushing this on Sunday midnight. I think it's correct, but maybe another pass to polish it a bit more is better. So here are two patches - one for 11-13, the other for 14-master. There's also a separate patch with pageinspect tests, but only as a demonstration of various (non)broken cases, not for commit. And then also a bash script generating indexes with random data, randomized summarization etc. - on unpatched systems this happens to fail in about 1/3 of the runs (at least for me). I haven't seen any failures with the patches attached (on any branch). As for the issue / fix, I don't think there's a better solution than what the patch does - we need to distinguish empty / all-nulls ranges, but we can't add a flag because of on-disk format / ABI. So using the existing flags seems like the only option - I haven't heard any other ideas so far, and I couldn't come up with any myself either. I've also thought about alternative "encodings" into allnulls/hasnulls, instead of treating (true,true) as "empty" - but none of that ended up being any simpler, quite the opposite actually, as it would change what the individual flags mean etc. So AFAICS this is the best / least disruptive option. I went over all the places touching these flags, to double check if any of those needs some tweaks (similar to union_tuples, which I missed for a long time). But I haven't found anything else, so I think this version of the patches is complete. As for assessing how many indexes are affected - in principle, any index on columns with NULLs may be broken. But it only matters if the index is used for IS NULL queries, other queries are not affected. I also realized that this only affects insertion of individual tuples into existing all-null summaries, not "bulk" summarization that sees all values at once. This happens because in this case add_values_to_range sets hasnulls=true for the first (NULL) value, and then calls the addValue procedure for the second (non-NULL) one, which resets the allnulls flag to false. But when inserting individual rows, we first set hasnulls=true, but brin_form_tuple ignores that because of allnulls=true. And then when inserting the second row, we start with hasnulls=false again, and the opclass quietly resets the allnulls flag. I guess this further reduces the number of broken indexes, especially for data sets with small null_frac, or for append-only (or -mostly) tables where most of the summarization is bulk. I still feel a bit uneasy about this, but I think the patch is solid. regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Attachment
On 1/9/23 00:34, Tomas Vondra wrote: > > I've been working on this over the past couple days, trying to polish > and commit it over the weekend - both into master and backbranches. > Sadly, the backpatching part turned out to be a bit more complicated > than I expected, because of the BRIN reworks in PG14 (done by me, as > foundation for the new opclasses, so ... well). > > Anyway, I got it done, but it's a bit uglier than I hoped for and I > don't feel like pushing this on Sunday midnight. I think it's correct, > but maybe another pass to polish it a bit more is better. > > So here are two patches - one for 11-13, the other for 14-master. > I spent a bit more time on this fix. I realized there are two more places that need fixes. Firstly, the placeholder tuple needs to be marked as "empty" too, so that it can be correctly updated by other backends etc. Secondly, union_tuples had a couple bugs in handling empty ranges (this is related to the placeholder tuple changes). I wonder what's the best way to test this in an automated way - it's very dependent on timing of the concurrent updated. For example we need to do something like this: T1: run pg_summarize_range() until it inserts the placeholder tuple T2: do an insert into the page range (updates placeholder) T1: continue pg_summarize_range() to merge into the placeholder But there are no convenient ways to do this, I think. I had to check the various cases using breakpoints in gdb etc. I'm not very happy with the union_tuples() changes - it's quite verbose, perhaps a bit too verbose. We have to check for empty ranges first, and then various combinations of allnulls/hasnulls flags for both BRIN tuples. There are 9 combinations, and the current code just checks them one by one - I was getting repeatedly confused by the original code, but maybe it's too much. As for the backpatch, I tried to keep it as close to the 14+ fixes as possible, but it effectively backports some of the 14+ BRIN changes. In particular, 14+ moved most of the NULL-handling logic from opclasses to brin.c, and I think it's reasonable to do that for the backbranches too. The alternative is to apply the same fix to every BRIN_PROCNUM_UNION opclass procedure out there. I guess doing that for minmax+inclusion is not a huge deal, but what about external opclasses? And without the fix the indexes are effectively broken. Fixing this outside in brin.c (in the union procedure) fixes this for every opclass procedure, without any actual limitation of functinality (14+ does that anyway). But maybe someone thinks this is a bad idea and we should do something else in the backbranches? regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Attachment
Thanks for doing all this. (Do I understand correctly that this patch is not in the commitfest?) I think my mental model for this was that "allnulls" meant that either there are no values for the column in question or that the values were all nulls (For minmax without NULL handling, which is where this all started, these two things are essentially the same: the range is not to be returned. So this became a bug the instant I added handling for NULL values.) I failed to realize that these were two different things, and this is likely the origin of all these troubles. What do you think of using the unused bit in BrinTuple->bt_info to denote a range that contains no heap tuples? This also means we need it in BrinMemTuple, I think we can do this: @@ -44,6 +44,7 @@ typedef struct BrinValues typedef struct BrinMemTuple { bool bt_placeholder; /* this is a placeholder tuple */ + bool bt_empty_range; /* range has no tuples */ BlockNumber bt_blkno; /* heap blkno that the tuple is for */ MemoryContext bt_context; /* memcxt holding the bt_columns values */ /* output arrays for brin_deform_tuple: */ @@ -69,7 +70,7 @@ typedef struct BrinTuple * * 7th (high) bit: has nulls * 6th bit: is placeholder tuple - * 5th bit: unused + * 5th bit: range has no tuples * 4-0 bit: offset of data * --------------- */ @@ -82,7 +83,7 @@ typedef struct BrinTuple * bt_info manipulation macros */ #define BRIN_OFFSET_MASK 0x1F -/* bit 0x20 is not used at present */ +#define BRIN_EMPTY_RANGE 0x20 #define BRIN_PLACEHOLDER_MASK 0x40 #define BRIN_NULLS_MASK 0x80 (Note that bt_empty_range uses a hole in the struct, so there's no ABI change.) This is BRIN-tuple-level, not column-level, so conceptually it seems more appropriate. (In the case where both are empty in union_tuples, we can return without entering the per-attribute loop at all, though I admit it's not a very interesting case.) This approach avoids having to invent the strange combination of all+has to mean empty. On 2023-Feb-24, Tomas Vondra wrote: > I wonder what's the best > way to test this in an automated way - it's very dependent on timing of > the concurrent updated. For example we need to do something like this: > > T1: run pg_summarize_range() until it inserts the placeholder tuple > T2: do an insert into the page range (updates placeholder) > T1: continue pg_summarize_range() to merge into the placeholder > > But there are no convenient ways to do this, I think. I had to check the > various cases using breakpoints in gdb etc. Yeah, I struggled with this during initial development but in the end did nothing. I think we would need to introduce some new framework, perhaps Korotkov stop-events stuff at https://postgr.es/m/CAPpHfdsTeb+hBT5=qxghjNG_cHcJLDaNQ9sdy9vNwBF2E2PuZA@mail.gmail.com which seemed to me a good fit -- we would add a stop point after the placeholder tuple is inserted. > I'm not very happy with the union_tuples() changes - it's quite verbose, > perhaps a bit too verbose. We have to check for empty ranges first, and > then various combinations of allnulls/hasnulls flags for both BRIN > tuples. There are 9 combinations, and the current code just checks them > one by one - I was getting repeatedly confused by the original code, but > maybe it's too much. I think it's okay. I tried to make it more compact (by saying "these two combinations here are case 2, and these two other are case 4", and keeping each of the other combinations a separate case; so there are really 7 cases). But that doesn't make it any easier to follow, on the contrary it was more convoluted. I think a dozen extra lines of source is not a problem. > The alternative is to apply the same fix to every BRIN_PROCNUM_UNION > opclass procedure out there. I guess doing that for minmax+inclusion is > not a huge deal, but what about external opclasses? And without the fix > the indexes are effectively broken. Fixing this outside in brin.c (in > the union procedure) fixes this for every opclass procedure, without any > actual limitation of functinality (14+ does that anyway). About the hypothetical question, you could as well ask what about unicorns. I have never seen any hint that any external opclass exist. I am all for maintaining compatibility, but I think this concern is overblown for BRIN. Anyway, I think your proposed fix is better than changing individual 'union' support procs, so it doesn't matter. As far as I understood, you're now worried that there will be an incompatibility because we will fail to call the 'union' procedure in cases where we previously called it? In other words, you fear that some hypothetical opclass was handling the NULL values in some way that's incompatible with this? I haven't thought terribly hard about this, but I can't see a way for this to cause incompatibilities. > But maybe someone thinks this is a bad idea and we should do something > else in the backbranches? I think the new handling of NULLs in commit 72ccf55cb99c ("Move IS [NOT] NULL handling from BRIN support functions") is better than what was there before, so I don't object to backpatching it now that we know it's necessary to fix a bug, and also we have field experience that the approach is solid. The attached patch is just a pointer to comments that I think need light edition. There's also a typo "bot" (for "both") in a comment that I think would go away if you accept my suggestion to store 'empty' at the tuple level. Note that I worked with the REL_14_STABLE sources, because for some reason I thought that that was the newest that needed backpatching of 72ccf55cb99c, but now that I'm finishing this email I realize that I should have used 13 instead /facepalm -- Álvaro Herrera PostgreSQL Developer — https://www.EnterpriseDB.com/ "La persona que no quería pecar / estaba obligada a sentarse en duras y empinadas sillas / desprovistas, por cierto de blandos atenuantes" (Patricio Vogel)
Attachment
On 3/3/23 11:32, Alvaro Herrera wrote: > > Thanks for doing all this. (Do I understand correctly that this patch > is not in the commitfest?) > > I think my mental model for this was that "allnulls" meant that either > there are no values for the column in question or that the values were > all nulls (For minmax without NULL handling, which is where this all > started, these two things are essentially the same: the range is not to > be returned. So this became a bug the instant I added handling for NULL > values.) I failed to realize that these were two different things, and > this is likely the origin of all these troubles. > > What do you think of using the unused bit in BrinTuple->bt_info to > denote a range that contains no heap tuples? This also means we need it > in BrinMemTuple, I think we can do this: > > @@ -44,6 +44,7 @@ typedef struct BrinValues > typedef struct BrinMemTuple > { > bool bt_placeholder; /* this is a placeholder tuple */ > + bool bt_empty_range; /* range has no tuples */ > BlockNumber bt_blkno; /* heap blkno that the tuple is for */ > MemoryContext bt_context; /* memcxt holding the bt_columns values */ > /* output arrays for brin_deform_tuple: */ > @@ -69,7 +70,7 @@ typedef struct BrinTuple > * > * 7th (high) bit: has nulls > * 6th bit: is placeholder tuple > - * 5th bit: unused > + * 5th bit: range has no tuples > * 4-0 bit: offset of data > * --------------- > */ > @@ -82,7 +83,7 @@ typedef struct BrinTuple > * bt_info manipulation macros > */ > #define BRIN_OFFSET_MASK 0x1F > -/* bit 0x20 is not used at present */ > +#define BRIN_EMPTY_RANGE 0x20 > #define BRIN_PLACEHOLDER_MASK 0x40 > #define BRIN_NULLS_MASK 0x80 > > (Note that bt_empty_range uses a hole in the struct, so there's no ABI > change.) > > This is BRIN-tuple-level, not column-level, so conceptually it seems > more appropriate. (In the case where both are empty in union_tuples, we > can return without entering the per-attribute loop at all, though I > admit it's not a very interesting case.) This approach avoids having to > invent the strange combination of all+has to mean empty. > Oh, that's an interesting idea! I haven't realized there's an unused bit at the tuple level, and I absolutely agree it'd be a better match than having this in individual summaries (like now). It'd mean we'd not have the option to fix this withing the opclasses, because we only pass them the BrinValue and not the tuple. But if you think that's reasonable, that'd be OK. The other thing I was unsure is if the bit could be set for any existing tuples, but AFAICS that shouldn't be possible - brin_form_tuple does palloc0, so it should be 0. I suspect doing this might make the patch quite a bit simpler, actually. > > On 2023-Feb-24, Tomas Vondra wrote: > >> I wonder what's the best >> way to test this in an automated way - it's very dependent on timing of >> the concurrent updated. For example we need to do something like this: >> >> T1: run pg_summarize_range() until it inserts the placeholder tuple >> T2: do an insert into the page range (updates placeholder) >> T1: continue pg_summarize_range() to merge into the placeholder >> >> But there are no convenient ways to do this, I think. I had to check the >> various cases using breakpoints in gdb etc. > > Yeah, I struggled with this during initial development but in the end > did nothing. I think we would need to introduce some new framework, > perhaps Korotkov stop-events stuff at > https://postgr.es/m/CAPpHfdsTeb+hBT5=qxghjNG_cHcJLDaNQ9sdy9vNwBF2E2PuZA@mail.gmail.com > which seemed to me a good fit -- we would add a stop point after the > placeholder tuple is inserted. > Yeah, but we don't have that at the moment. I actually ended up adding a couple sleeps during development, which allowed me to hit just the right order of operations - a poor-man's version of those stop-events. Did work but hardly an acceptable approach. >> I'm not very happy with the union_tuples() changes - it's quite verbose, >> perhaps a bit too verbose. We have to check for empty ranges first, and >> then various combinations of allnulls/hasnulls flags for both BRIN >> tuples. There are 9 combinations, and the current code just checks them >> one by one - I was getting repeatedly confused by the original code, but >> maybe it's too much. > > I think it's okay. I tried to make it more compact (by saying "these > two combinations here are case 2, and these two other are case 4", and > keeping each of the other combinations a separate case; so there are > really 7 cases). But that doesn't make it any easier to follow, on the > contrary it was more convoluted. I think a dozen extra lines of source > is not a problem. > OK >> The alternative is to apply the same fix to every BRIN_PROCNUM_UNION >> opclass procedure out there. I guess doing that for minmax+inclusion is >> not a huge deal, but what about external opclasses? And without the fix >> the indexes are effectively broken. Fixing this outside in brin.c (in >> the union procedure) fixes this for every opclass procedure, without any >> actual limitation of functinality (14+ does that anyway). > > About the hypothetical question, you could as well ask what about > unicorns. I have never seen any hint that any external opclass exist. > I am all for maintaining compatibility, but I think this concern is > overblown for BRIN. Anyway, I think your proposed fix is better than > changing individual 'union' support procs, so it doesn't matter. > OK > As far as I understood, you're now worried that there will be an > incompatibility because we will fail to call the 'union' procedure in > cases where we previously called it? In other words, you fear that some > hypothetical opclass was handling the NULL values in some way that's > incompatible with this? I haven't thought terribly hard about this, but > I can't see a way for this to cause incompatibilities. > Yeah, the possible incompatibility is one concern - I have a hard time imagining such an opclass, because it'd have to handle NULLs in some strange way. But and as you noted, we're not aware of any external BRIN opclasses, so maybe this is OK. The other concern is more generic - as I mentioned, moving the NULL handling from opclasses to brin.c is what we did in PG14, so this feels a bit like a backport, and I dislike that a little bit. >> But maybe someone thinks this is a bad idea and we should do something >> else in the backbranches? > > I think the new handling of NULLs in commit 72ccf55cb99c ("Move IS [NOT] > NULL handling from BRIN support functions") is better than what was > there before, so I don't object to backpatching it now that we know it's > necessary to fix a bug, and also we have field experience that the > approach is solid. > OK, good to hear. > The attached patch is just a pointer to comments that I think need light > edition. There's also a typo "bot" (for "both") in a comment that I > think would go away if you accept my suggestion to store 'empty' at the > tuple level. Note that I worked with the REL_14_STABLE sources, because > for some reason I thought that that was the newest that needed > backpatching of 72ccf55cb99c, but now that I'm finishing this email I > realize that I should have used 13 instead /facepalm > Thanks. I'll try to rework the patches to use the bt_info unused bit, and report back in a week or two. regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Hi, It took me a while but I finally got back to reworking this to use the bt_info bit, as proposed by Alvaro. And it turned out to work great, because (a) it's a tuple-level flag, i.e. the right place, and (b) it does not overload existing flags. This greatly simplified the code in add_values_to_range and (especially) union_tuples, making it much easier to understand, I think. One disadvantage is we are unable to see which ranges are empty in current pageinspect, but 0002 addresses that by adding "empty" column to the brin_page_items() output. That's a matter for master only, though. It's a trivial patch and it makes it easier/possible to test this, so we should consider to squeeze it into PG16. I did quite a bit of testing - the attached 0003 adds extra tests, but I don't propose to get this committed as is - it's rather overkill. Maybe some reduced version of it ... The hardest thing to test is the union_tuples() part, as it requires concurrent operations with "correct" timing. Easy to simulate by breakpoints in GDB, not so much in plain regression/TAP tests. There's also a stress tests, doing a lot of randomized summarizations, etc. Without the fix this failed in maybe 30% of runs, now I did ~100 runs without a single failure. I haven't done any backporting, but I think it should be simpler than with the earlier approach. I wonder if we need to care about starting to use the previously unused bit - I don't think so, in the worst case we'll just ignore it, but maybe I'm missing something (e.g. when using physical replication). regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Attachment
Hi, here's an updated version of the patch, including a backport version. I ended up making the code yet a bit closer to master by introducing add_values_to_range(). The current pre-14 code has the loop adding data to the BRIN tuple in two places, but with the "fixed" logic handling NULLs and the empty_range flag the amount of duplicated code got too high, so this seem reasonable. Both cases have a patch extending pageinspect to show the new flag, but obviously we should commit that only in master. In backbranches it's meant only to make testing easier. I plan to do a bit more testing, I'd welcome some feedback - it's a long-standing bug, and it'd be good to finally get this fixed. I don't think the patch can be made any simpler. regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Attachment
On 2023-Apr-23, Tomas Vondra wrote: > here's an updated version of the patch, including a backport version. I > ended up making the code yet a bit closer to master by introducing > add_values_to_range(). The current pre-14 code has the loop adding data > to the BRIN tuple in two places, but with the "fixed" logic handling > NULLs and the empty_range flag the amount of duplicated code got too > high, so this seem reasonable. In backbranches, the new field to BrinMemTuple needs to be at the end of the struct, to avoid ABI breakage. There's a comment in add_values_to_range with a typo "If the range was had". Also, "So we should not see empty range that was not modified" should perhaps be "should not see an empty range". (As for your FIXME comment in brin_memtuple_initialize, I think you're correct: we definitely need to reset bt_placeholder. Otherwise, we risk places that call it in a loop using a BrinMemTuple with one range with the flag set, in a range where that doesn't hold. It might be impossible for this to happen, given how narrow the conditions are on which bt_placeholder is used; but it seems safer to reset it anyway.) Some pgindent noise would be induced by this patch. I think it's worth cleaning up ahead of time. I did a quick experiment of turning the patches over -- applying test first, code fix after (requires some conflict fixing). With that I verified that the behavior of 'hasnulls' indeed changes with the code fix. > Both cases have a patch extending pageinspect to show the new flag, but > obviously we should commit that only in master. In backbranches it's > meant only to make testing easier. In backbranches, I think it should be reasonably easy to add a --1.7--1.7.1.sql file and set the default version to 1.7.1; that then enables us to have the functionality (and the tests) in older branches too. If you just add a --1.X.1--1.12.sql version to each branch that's identical to that branch's current pageinspect version upgrade script, that would let us have working upgrade paths for all branches. This is a bit laborious but straightforward enough. If you don't feel like adding that, I volunteer to add the necessary scripts to all branches after you commit the bugfix, and ensure that all upgrade paths work correctly. > I plan to do a bit more testing, I'd welcome some feedback - it's a > long-standing bug, and it'd be good to finally get this fixed. I don't > think the patch can be made any simpler. The approach looks good to me. -- Álvaro Herrera 48°01'N 7°57'E — https://www.EnterpriseDB.com/ "Oh, great altar of passive entertainment, bestow upon me thy discordant images at such speed as to render linear thought impossible" (Calvin a la TV)
Alvaro Herrera <alvherre@alvh.no-ip.org> writes: > On 2023-Apr-23, Tomas Vondra wrote: >> Both cases have a patch extending pageinspect to show the new flag, but >> obviously we should commit that only in master. In backbranches it's >> meant only to make testing easier. > In backbranches, I think it should be reasonably easy to add a > --1.7--1.7.1.sql file and set the default version to 1.7.1; that then > enables us to have the functionality (and the tests) in older branches > too. If you just add a --1.X.1--1.12.sql version to each branch that's > identical to that branch's current pageinspect version upgrade script, > that would let us have working upgrade paths for all branches. This is > a bit laborious but straightforward enough. "A bit laborious"? That seems enormously out of proportion to the benefit of putting this test case into back branches. It will have costs for end users too, not only us. As an example, it would unecessarily block some upgrade paths, if the upgraded-to installation is slightly older and lacks the necessary --1.X.1--1.12 script. > If you don't feel like adding that, I volunteer to add the necessary > scripts to all branches after you commit the bugfix, and ensure that all > upgrade paths work correctly. I do not think this should happen at all, whether you're willing to do the work or not. regards, tom lane
On 4/24/23 17:46, Tom Lane wrote: > Alvaro Herrera <alvherre@alvh.no-ip.org> writes: >> On 2023-Apr-23, Tomas Vondra wrote: >>> Both cases have a patch extending pageinspect to show the new flag, but >>> obviously we should commit that only in master. In backbranches it's >>> meant only to make testing easier. > >> In backbranches, I think it should be reasonably easy to add a >> --1.7--1.7.1.sql file and set the default version to 1.7.1; that then >> enables us to have the functionality (and the tests) in older branches >> too. If you just add a --1.X.1--1.12.sql version to each branch that's >> identical to that branch's current pageinspect version upgrade script, >> that would let us have working upgrade paths for all branches. This is >> a bit laborious but straightforward enough. > > "A bit laborious"? That seems enormously out of proportion to the > benefit of putting this test case into back branches. It will have > costs for end users too, not only us. As an example, it would > unecessarily block some upgrade paths, if the upgraded-to installation > is slightly older and lacks the necessary --1.X.1--1.12 script. > Why would that block the upgrade? Presumably we'd add two upgrade scripts in the master, to allow upgrade both from 1.X and 1.X.1. >> If you don't feel like adding that, I volunteer to add the necessary >> scripts to all branches after you commit the bugfix, and ensure that all >> upgrade paths work correctly. > > I do not think this should happen at all, whether you're willing to > do the work or not. FWIW I'm fine with not doing that. As mentioned, I only included this patch to make testing the patch easier (otherwise the flag is impossible to inspect directly). regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Tomas Vondra <tomas.vondra@enterprisedb.com> writes: > On 4/24/23 17:46, Tom Lane wrote: >> "A bit laborious"? That seems enormously out of proportion to the >> benefit of putting this test case into back branches. It will have >> costs for end users too, not only us. As an example, it would >> unecessarily block some upgrade paths, if the upgraded-to installation >> is slightly older and lacks the necessary --1.X.1--1.12 script. > Why would that block the upgrade? Presumably we'd add two upgrade > scripts in the master, to allow upgrade both from 1.X and 1.X.1. It would for example block updating from 14.8 or later to 15.2, since a 15.2 installation would not have the script to update from 1.X.1. Yeah, people could work around that by only installing the latest version, but there are plenty of real-world scenarios where you'd be creating friction, or at least confusion. I do not think that this test case is worth it. regards, tom lane
On 4/24/23 23:05, Tomas Vondra wrote: > > > On 4/24/23 17:46, Tom Lane wrote: >> Alvaro Herrera <alvherre@alvh.no-ip.org> writes: >>> On 2023-Apr-23, Tomas Vondra wrote: >>>> Both cases have a patch extending pageinspect to show the new flag, but >>>> obviously we should commit that only in master. In backbranches it's >>>> meant only to make testing easier. >> >>> In backbranches, I think it should be reasonably easy to add a >>> --1.7--1.7.1.sql file and set the default version to 1.7.1; that then >>> enables us to have the functionality (and the tests) in older branches >>> too. If you just add a --1.X.1--1.12.sql version to each branch that's >>> identical to that branch's current pageinspect version upgrade script, >>> that would let us have working upgrade paths for all branches. This is >>> a bit laborious but straightforward enough. >> >> "A bit laborious"? That seems enormously out of proportion to the >> benefit of putting this test case into back branches. It will have >> costs for end users too, not only us. As an example, it would >> unecessarily block some upgrade paths, if the upgraded-to installation >> is slightly older and lacks the necessary --1.X.1--1.12 script. >> > > Why would that block the upgrade? Presumably we'd add two upgrade > scripts in the master, to allow upgrade both from 1.X and 1.X.1. > D'oh! I just realized I misunderstood the issue. Yes, if the target version is missing the new script, that won't work. I'm not sure how likely that is - in my experience people refresh versions at the same time - but it's certainly an assumption we shouldn't do, I guess. regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 4/24/23 17:36, Alvaro Herrera wrote: > On 2023-Apr-23, Tomas Vondra wrote: > >> here's an updated version of the patch, including a backport version. I >> ended up making the code yet a bit closer to master by introducing >> add_values_to_range(). The current pre-14 code has the loop adding data >> to the BRIN tuple in two places, but with the "fixed" logic handling >> NULLs and the empty_range flag the amount of duplicated code got too >> high, so this seem reasonable. > > In backbranches, the new field to BrinMemTuple needs to be at the end of > the struct, to avoid ABI breakage. > Good point. > There's a comment in add_values_to_range with a typo "If the range was had". > Also, "So we should not see empty range that was not modified" should > perhaps be "should not see an empty range". > OK, I'll check the wording of the comments. > (As for your FIXME comment in brin_memtuple_initialize, I think you're > correct: we definitely need to reset bt_placeholder. Otherwise, we risk > places that call it in a loop using a BrinMemTuple with one range with > the flag set, in a range where that doesn't hold. It might be > impossible for this to happen, given how narrow the conditions are on > which bt_placeholder is used; but it seems safer to reset it anyway.) > Yeah. But isn't that a separate preexisting issue, strictly speaking? > Some pgindent noise would be induced by this patch. I think it's worth > cleaning up ahead of time. > True. Will do. > I did a quick experiment of turning the patches over -- applying test > first, code fix after (requires some conflict fixing). With that I > verified that the behavior of 'hasnulls' indeed changes with the code > fix. > Thanks. Could you do some testing of the union_tuples stuff too? It's a bit tricky part - the behavior is timing sensitive, so testing it requires gdb breakpoints breakpoints or something like that. I think it's correct, but it'd be nice to check that. regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 2023-Apr-24, Tomas Vondra wrote: > On 4/24/23 17:36, Alvaro Herrera wrote: > > (As for your FIXME comment in brin_memtuple_initialize, I think you're > > correct: we definitely need to reset bt_placeholder. Otherwise, we risk > > places that call it in a loop using a BrinMemTuple with one range with > > the flag set, in a range where that doesn't hold. It might be > > impossible for this to happen, given how narrow the conditions are on > > which bt_placeholder is used; but it seems safer to reset it anyway.) > > Yeah. But isn't that a separate preexisting issue, strictly speaking? Yes. > > I did a quick experiment of turning the patches over -- applying test > > first, code fix after (requires some conflict fixing). With that I > > verified that the behavior of 'hasnulls' indeed changes with the code > > fix. > > Thanks. Could you do some testing of the union_tuples stuff too? It's a > bit tricky part - the behavior is timing sensitive, so testing it > requires gdb breakpoints breakpoints or something like that. I think > it's correct, but it'd be nice to check that. I'll have a look. -- Álvaro Herrera Breisgau, Deutschland — https://www.EnterpriseDB.com/ <inflex> really, I see PHP as like a strange amalgamation of C, Perl, Shell <crab> inflex: you know that "amalgam" means "mixture with mercury", more or less, right? <crab> i.e., "deadly poison"
On 4/24/23 23:20, Tomas Vondra wrote: > On 4/24/23 17:36, Alvaro Herrera wrote: >> On 2023-Apr-23, Tomas Vondra wrote: >> >>> here's an updated version of the patch, including a backport version. I >>> ended up making the code yet a bit closer to master by introducing >>> add_values_to_range(). The current pre-14 code has the loop adding data >>> to the BRIN tuple in two places, but with the "fixed" logic handling >>> NULLs and the empty_range flag the amount of duplicated code got too >>> high, so this seem reasonable. >> >> In backbranches, the new field to BrinMemTuple needs to be at the end of >> the struct, to avoid ABI breakage. >> Unfortunately, this is not actually possible :-( The BrinMemTuple has a FLEXIBLE_ARRAY_MEMBER at the end, so we can't place anything after it. I think we have three options: a) some other approach? - I really can't see any, except maybe for going back to the previous approach (i.e. encoding the info using the existing BrinValues allnulls/hasnulls flags) b) encoding the info in existing BrinMemTuple flags - e.g. we could use bt_placeholder to store two bits, not just one. Seems a bit ugly. c) ignore the issue - AFAICS this would be an issue only for (external) code accessing BrinMemTuple structs, but I don't think we're aware of any out-of-core BRIN opclasses or anything like that ... regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Hi, On Sun, May 07, 2023 at 12:13:07AM +0200, Tomas Vondra wrote: > > c) ignore the issue - AFAICS this would be an issue only for (external) > code accessing BrinMemTuple structs, but I don't think we're aware of > any out-of-core BRIN opclasses or anything like that ... FTR there's at least postgis that implments BRIN opclasses (for geometries and geographies), but there's nothing fancy in the implementation and it doesn't access BrinMemTuple struct.
On 5/7/23 07:08, Julien Rouhaud wrote: > Hi, > > On Sun, May 07, 2023 at 12:13:07AM +0200, Tomas Vondra wrote: >> >> c) ignore the issue - AFAICS this would be an issue only for (external) >> code accessing BrinMemTuple structs, but I don't think we're aware of >> any out-of-core BRIN opclasses or anything like that ... > > FTR there's at least postgis that implments BRIN opclasses (for geometries and > geographies), but there's nothing fancy in the implementation and it doesn't > access BrinMemTuple struct. Right. I believe that should be fine, because opclasses don't access the tuple directly - instead we pass pointers to individual pieces. But maybe it'd be a good idea to test this. regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 2023-May-07, Tomas Vondra wrote: > > Álvaro wrote: > >> In backbranches, the new field to BrinMemTuple needs to be at the end of > >> the struct, to avoid ABI breakage. > > Unfortunately, this is not actually possible :-( > > The BrinMemTuple has a FLEXIBLE_ARRAY_MEMBER at the end, so we can't > place anything after it. I think we have three options: > > a) some other approach? - I really can't see any, except maybe for going > back to the previous approach (i.e. encoding the info using the existing > BrinValues allnulls/hasnulls flags) Actually, mine was quite the stupid suggestion: the BrinMemTuple already has a 3 byte hole in the place where you originally wanted to add the flag: struct BrinMemTuple { _Bool bt_placeholder; /* 0 1 */ /* XXX 3 bytes hole, try to pack */ BlockNumber bt_blkno; /* 4 4 */ MemoryContext bt_context; /* 8 8 */ Datum * bt_values; /* 16 8 */ _Bool * bt_allnulls; /* 24 8 */ _Bool * bt_hasnulls; /* 32 8 */ BrinValues bt_columns[]; /* 40 0 */ /* size: 40, cachelines: 1, members: 7 */ /* sum members: 37, holes: 1, sum holes: 3 */ /* last cacheline: 40 bytes */ }; so putting it there was already not causing any ABI breakage. So, the solution to this problem of not being able to put it at the end is just to return the struct to your original formulation. -- Álvaro Herrera 48°01'N 7°57'E — https://www.EnterpriseDB.com/ "La primera ley de las demostraciones en vivo es: no trate de usar el sistema. Escriba un guión que no toque nada para no causar daños." (Jakob Nielsen)
On 5/15/23 12:06, Alvaro Herrera wrote: > On 2023-May-07, Tomas Vondra wrote: > >>> Álvaro wrote: >>>> In backbranches, the new field to BrinMemTuple needs to be at the end of >>>> the struct, to avoid ABI breakage. >> >> Unfortunately, this is not actually possible :-( >> >> The BrinMemTuple has a FLEXIBLE_ARRAY_MEMBER at the end, so we can't >> place anything after it. I think we have three options: >> >> a) some other approach? - I really can't see any, except maybe for going >> back to the previous approach (i.e. encoding the info using the existing >> BrinValues allnulls/hasnulls flags) > > Actually, mine was quite the stupid suggestion: the BrinMemTuple already > has a 3 byte hole in the place where you originally wanted to add the > flag: > > struct BrinMemTuple { > _Bool bt_placeholder; /* 0 1 */ > > /* XXX 3 bytes hole, try to pack */ > > BlockNumber bt_blkno; /* 4 4 */ > MemoryContext bt_context; /* 8 8 */ > Datum * bt_values; /* 16 8 */ > _Bool * bt_allnulls; /* 24 8 */ > _Bool * bt_hasnulls; /* 32 8 */ > BrinValues bt_columns[]; /* 40 0 */ > > /* size: 40, cachelines: 1, members: 7 */ > /* sum members: 37, holes: 1, sum holes: 3 */ > /* last cacheline: 40 bytes */ > }; > > so putting it there was already not causing any ABI breakage. So, the > solution to this problem of not being able to put it at the end is just > to return the struct to your original formulation. > Thanks, that's pretty lucky. It means we're not breaking on-disk format nor ABI, which is great. Attached is a final version of the patches - I intend to do a bit more testing, go through the comments once more, and get this committed today or perhaps tomorrow morning, so that it gets into beta1. Unfortunately, while polishing the patches, I realized union_tuples() has yet another long-standing bug with handling NULL values, because it does this: /* Adjust "hasnulls". */ if (!col_a->bv_hasnulls && col_b->bv_hasnulls) col_a->bv_hasnulls = true; but let's assume "col_a" is a summary representing "1" and "col_b" represents NULL (col_b->bv_hasnulls=false col_b->bv_allnulls=true). Well, in that case we fail to "remember" col_a should represent NULL values too :-( This is somewhat separate issue, because it's unrelated to empty ranges (neither of the two ranges is empty). It's hard to test it, because it requires a particular timing of the concurrent actions, but a breakpoint in brin.c on the brin_can_do_samepage_update call (in summarize_range) does the trick for manual testing. 0001 fixes the issue. 0002 is the original fix, and 0003 is just the pageinspect changes (for master only). For the backbranches, I thought about making the code more like master (by moving some of the handling from opclasses to brin.c), but decided not to. It'd be low-risk, but it feels wrong to kinda do what the master does under "oi_regular_nulls" flag. regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Attachment
On 5/18/23 20:45, Tomas Vondra wrote: > ... > > 0001 fixes the issue. 0002 is the original fix, and 0003 is just the > pageinspect changes (for master only). > > For the backbranches, I thought about making the code more like master > (by moving some of the handling from opclasses to brin.c), but decided > not to. It'd be low-risk, but it feels wrong to kinda do what the master > does under "oi_regular_nulls" flag. > I've now pushed all these patches into relevant branches, after some minor last-minute tweaks, and so far it didn't cause any buildfarm issues. Assuming this fully fixes the NULL-handling for BRIN, this leaves just the deadlock issue discussed in [1]. It seems rather unfortunate all these issues went unnoticed / unreported essentially since BRIN was introduced in 9.5. To some extent it might be explained by fairly low likelihood of actually hitting the issue (just the right timing, concurrency with summarization, NULL values, ...). It took me quite a bit of time and luck to (accidentally) hit these issues while stress testing the code. But there's also the problem of writing tests for this kind of thing. To exercise the interesting parts (e.g. the union_tuples), it's necessary to coordinate the order of concurrent steps - but what's a good generic way to do that (which we could do in TAP tests)? In manual testing it's doable by setting breakpoints on a particular lines, and step through the concurrent processes that way. But that doesn't seem like a particularly great solution for regression tests. I can imagine adding some sort of "probes" into the code and then attaching breakpoints to those, but surely we're not the first project needing this ... regards [1] https://www.postgresql.org/message-id/261e68bc-f5f5-5234-fb2c-af4f583513c0@enterprisedb.com -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company