Thread: Missing update of all_hasnulls in BRIN opclasses

Missing update of all_hasnulls in BRIN opclasses

From
Tomas Vondra
Date:
Hi,

While working on some BRIN code, I discovered a bug in handling NULL
values - when inserting a non-NULL value into a NULL-only range, we
reset the all_nulls flag but don't update the has_nulls flag. And
because of that we then fail to return the range for IS NULL ranges.

Reproducing this is trivial:

  create table t (a int);
  create index on t using brin (a);
  insert into t values (null);
  insert into t values (1);

  set enable_seqscan  = off;
  select * from t where a is null;

This should return 1 row, but actually it returns no rows.

Attached is a patch fixing this by properly updating the has_nulls flag.

I reproduced this all the way back to 9.5, so it's a long-standing bug.
It's interesting no one noticed / reported it so far, it doesn't seem
like a particularly rare corner case.


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
Attachment

Re: Missing update of all_hasnulls in BRIN opclasses

From
Matthias van de Meent
Date:
On Fri, 21 Oct 2022 at 17:24, Tomas Vondra
<tomas.vondra@enterprisedb.com> wrote:
>
> Hi,
>
> While working on some BRIN code, I discovered a bug in handling NULL
> values - when inserting a non-NULL value into a NULL-only range, we
> reset the all_nulls flag but don't update the has_nulls flag. And
> because of that we then fail to return the range for IS NULL ranges.

Ah, that's bad.

One question though: doesn't (shouldn't?) column->bv_allnulls already
imply column->bv_hasnulls? The column has nulls if all of the values
are null, right? Or is the description of the field deceptive, and
does bv_hasnulls actually mean "has nulls bitmap"?

> Attached is a patch fixing this by properly updating the has_nulls flag.

One comment on the patch:

> +SET enable_seqscan = off;
> + [...]
> +SET enable_seqscan = off;

Looks like duplicated SETs. Should that last one be RESET instead?

Apart from that, this patch looks good.

- Matthias



Re: Missing update of all_hasnulls in BRIN opclasses

From
Tomas Vondra
Date:

On 10/21/22 17:50, Matthias van de Meent wrote:
> On Fri, 21 Oct 2022 at 17:24, Tomas Vondra
> <tomas.vondra@enterprisedb.com> wrote:
>>
>> Hi,
>>
>> While working on some BRIN code, I discovered a bug in handling NULL
>> values - when inserting a non-NULL value into a NULL-only range, we
>> reset the all_nulls flag but don't update the has_nulls flag. And
>> because of that we then fail to return the range for IS NULL ranges.
> 
> Ah, that's bad.
> 

Yeah, I guess we'll need to inform the users to consider rebuilding BRIN
indexes on NULL-able columns.

> One question though: doesn't (shouldn't?) column->bv_allnulls already
> imply column->bv_hasnulls? The column has nulls if all of the values
> are null, right? Or is the description of the field deceptive, and
> does bv_hasnulls actually mean "has nulls bitmap"?
> 

What null bitmap do you mean? We're talking about summary for a page
range - IIRC we translate this to nullbitmap for a BRIN tuple, but there
may be multiple columns, and "has nulls bitmap" is an aggregate over all
of them.

Yeah, maybe it'd make sense to also have has_nulls=true whenever
all_nulls=true, and maybe it'd be simpler because it'd be enough to
check just one flag in consistent function etc. But we still need to
track 2 different states - "has nulls" and "has summary".

In any case, this ship sailed long ago - at least for the existing
opclasses.


>> Attached is a patch fixing this by properly updating the has_nulls flag.
> 
> One comment on the patch:
> 
>> +SET enable_seqscan = off;
>> + [...]
>> +SET enable_seqscan = off;
> 
> Looks like duplicated SETs. Should that last one be RESET instead?
> 

Yeah, should have been RESET.

> Apart from that, this patch looks good.
> 

Thanks!

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Missing update of all_hasnulls in BRIN opclasses

From
Tomas Vondra
Date:
On 10/21/22 18:44, Tomas Vondra wrote:
> 
> ...
>
>> Apart from that, this patch looks good.
>>

Sadly, I don't think we can fix it like this :-(

The problem is that all ranges start with all_nulls=true, because the
new range gets initialized by brin_memtuple_initialize() like that. But
this happens for *every* range before we even start processing the rows.
So this way all the ranges would end up with has_nulls=true, making that
flag pretty useless.

Actually, even just doing "truncate" on the table creates such all-nulls
range for the first range, and serializes it to disk.

I wondered why we even write such tuples for "empty" ranges to disk, for
example after "TRUNCATE" - the table is empty by definition, so how come
we write all-nulls brin summary for the first range?

For example brininsert() checks if the brin tuple was modified and needs
to be written back, but brinbuild() just ignores that, and initializes
(and writes) writes the tuple to disk anyway. I think we should not do
that - there should be a flag in BrinBuildState, tracking if the BRIN
tuple was modified, and we should only write it if it's true.

That means we should never get an on-disk summary representing nothing.

That doesn't fix the issue, though, because we still need to pass the
memtuple tuple to the add_value opclass procedure, and whether it sets
the has_nulls flag depends on whether it's a new tuple representing no
other rows (in which case has_nulls remains false) or whether it was
read from disk (in which case it needs to be flipped to 'true').

But the opclass has no way to tell the difference at the moment - it
just gets the BrinMemTuple. So we'd have to extend this, somehow.

I wonder how to do this in a back-patchable way - we can't add
parameters to the opclass procedure, and the other solution seems to be
storing it right in the BrinMemTuple, somehow. But that's likely an ABI
break :-(

The only solution I can think of is actually passing it using all_nulls
and has_nulls - we could set both flags to true (which we never do now)
and teach the opclass that it signifies "empty" (and thus not to update
has_nulls after resetting all_nulls).

Something like the attached (I haven't added any more tests, not sure
what would those look like - I can't think of a query testing this,
although maybe we could check how the flags change using pageinspect).


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
Attachment

Re: Missing update of all_hasnulls in BRIN opclasses

From
Alvaro Herrera
Date:
On 2022-Oct-22, Tomas Vondra wrote:

> I wonder how to do this in a back-patchable way - we can't add
> parameters to the opclass procedure, and the other solution seems to be
> storing it right in the BrinMemTuple, somehow. But that's likely an ABI
> break :-(

Hmm, I don't see the ABI incompatibility.  BrinMemTuple is an in-memory
structure, so you can add new members at the end of the struct and it
will pose no problems to existing code.

> The only solution I can think of is actually passing it using all_nulls
> and has_nulls - we could set both flags to true (which we never do now)
> and teach the opclass that it signifies "empty" (and thus not to update
> has_nulls after resetting all_nulls).
> 
> Something like the attached (I haven't added any more tests, not sure
> what would those look like - I can't think of a query testing this,
> although maybe we could check how the flags change using pageinspect).

I'll try to have a look at these patches tomorrow or on Monday.

-- 
Álvaro Herrera               48°01'N 7°57'E  —  https://www.EnterpriseDB.com/
"I suspect most samba developers are already technically insane...
Of course, since many of them are Australians, you can't tell." (L. Torvalds)



Re: Missing update of all_hasnulls in BRIN opclasses

From
Tomas Vondra
Date:
On 10/22/22 10:00, Alvaro Herrera wrote:
> On 2022-Oct-22, Tomas Vondra wrote:
> 
>> I wonder how to do this in a back-patchable way - we can't add
>> parameters to the opclass procedure, and the other solution seems to be
>> storing it right in the BrinMemTuple, somehow. But that's likely an ABI
>> break :-(
> 
> Hmm, I don't see the ABI incompatibility.  BrinMemTuple is an in-memory
> structure, so you can add new members at the end of the struct and it
> will pose no problems to existing code.
> 

But we're not passing BrinMemTuple to the opclass procedures - we're
passing a pointer to BrinValues, so we'd have to add the flag there. And
we're storing an array of those, so adding a field may shift the array
even if you add it at the end. Not sure if that's OK or not.

>> The only solution I can think of is actually passing it using all_nulls
>> and has_nulls - we could set both flags to true (which we never do now)
>> and teach the opclass that it signifies "empty" (and thus not to update
>> has_nulls after resetting all_nulls).
>>
>> Something like the attached (I haven't added any more tests, not sure
>> what would those look like - I can't think of a query testing this,
>> although maybe we could check how the flags change using pageinspect).
> 
> I'll try to have a look at these patches tomorrow or on Monday.
> 

I was experimenting with this a bit more, and unfortunately the latest
patch is still a few bricks shy - it did fix this particular issue, but
there were other cases that remained/got broken. See the first patch,
that adds a bunch of pageinspect tests testing different combinations.

After thinking about it a bit more, I think we can't quite fix this at
the opclass level, so the yesterday's patches are wrong. Instead, this
should be fixed in values_add_to_range() - the whole trick is we need to
remember the range was empty at the beginning, and only set the flag
when allnulls is false.

The reworked patch does that. And we can use the same logic (both flags
set mean no tuples were added to the range) when building the index, a
separate flag is not needed.

This slightly affects existing regression tests, because we won't create
any ranges for empty table (now we created one, because we initialized a
tuple in brinbuild and then wrote it to disk). This means that
brin_summarize_range now returns 0, but I think that's fine.


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
Attachment

Re: Missing update of all_hasnulls in BRIN opclasses

From
Zhihong Yu
Date:
Hi, Tomas:
For 0002-fixup-brin-has_nulls-20221022.patch :

+       first_row = (bval->bv_hasnulls && bval->bv_allnulls);
+
+       if (bval->bv_hasnulls && bval->bv_allnulls)

It seems the if condition can be changed to `if (first_row)` which is more readable.

Chhers

Re: Missing update of all_hasnulls in BRIN opclasses

From
Tomas Vondra
Date:
Here's an improved version of the fix I posted about a month ago.

0001

Adds tests demonstrating the issue, as before. I realized there's an
isolation test in src/test/module/brin that can demonstrate this, so I
modified it too, not just the pageinspect test as before.


0002

Uses the combination of all_nulls/has_nulls to identify "empty" range,
and does not store them to disk. I however realized not storing "empty"
ranges is probably not desirable. Imagine a table with a "gap" (e.g. due
to a batch DELETE) of pages with no rows:

    create table x (a int) with (fillfactor = 10);
    insert into x select i from generate_series(1,1000) s(i);
    delete from x where a < 1000;
    create index on x using brin(a) with (pages_per_range=1);

Any bitmap index scan using this index would have to scan all those
empty ranges, because there are no summaries.


0003

Still uses the all_nulls/has_nulls flags to identify empty ranges, but
stores them - and then we check the combination in bringetbitmap() to
skip those ranges as not matching any scan keys.

This also restores some of the existing behavior - for example creating
a BRIN index on entirely empty table (no pages at all) still allocates a
48kB index (3 index pages, 3 fsm pages). Seems a bit strange, but it's
an existing behavior.


As explained before, I've considered adding an new flag to one of the
BRIN structs - BrinMemTuple or BrinValues. But we can't add as last
field to BrinMemTuple because there already is FLEXIBLE_ARRAY_MEMBER,
and adding a field to BrinValues would change stride of the bt_columns
array. So this would break ABI, making this not backpatchable.

Furthermore, if we want to store summaries for empty ranges (which is
what 0003 does), we need to store the flag in the BRIN index tuple. And
we can't change the on-disk representation in backbranches, so encoding
this in the existing tuple seems like the only way.

So using the combination of all_nulls/has_nulls flag seems like the only
viable option, unfortunately.

Opinions? Considering this will need to be backpatches, it'd be good to
get some feedback on the approach. I think it's fine, but it would be
unfortunate to fix one issue but break BRIN in a different way.


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
Attachment

Re: Missing update of all_hasnulls in BRIN opclasses

From
Justin Pryzby
Date:
On Mon, Nov 28, 2022 at 01:13:14AM +0100, Tomas Vondra wrote:
> Opinions? Considering this will need to be backpatches, it'd be good to
> get some feedback on the approach. I think it's fine, but it would be
> unfortunate to fix one issue but break BRIN in a different way.

> --- a/contrib/pageinspect/Makefile
> +++ b/contrib/pageinspect/Makefile
> @@ -22,7 +22,7 @@ DATA =  pageinspect--1.10--1.11.sql \
>      pageinspect--1.0--1.1.sql
>  PGFILEDESC = "pageinspect - functions to inspect contents of database pages"
>  
> -REGRESS = page btree brin gin gist hash checksum oldextversions
> +REGRESS = page btree brin gin gist hash checksum oldextversions brinbugs

I can't comment on the patch itself, but:

These changes to ./Makefile will also need to be made in ./meson.build.

Also (per cirrusci), the test sometimes fail since two parallel tests
are doing "CREATE EXTENSION".



Re: Missing update of all_hasnulls in BRIN opclasses

From
Tomas Vondra
Date:
Hi,

here's an improved and cleaned-up version of the fix.

I removed brinbugs.sql from pageinspect, because it seems enough to have
the other tests (I added brinbugs first, before realizing those exist).
This also means meson.build is fine and there are no tests doing CREATE
EXTENSION concurrently etc.

I decided to go with the 0003 approach, which stores summaries for empty
ranges. That seems to be less intrusive (it's more like what we do now),
and works better for tables with a lot of bulk deletes. It means we can
have ranges with allnulls=hasnulls=true, which wasn't the case before,
but I don't see why this should break e.g. custom opclasses (if it does,
it probably means the opclass is wrong).

Finally, I realized union_tuples needs to be tweaked to deal with empty
ranges properly. The changes are fairly limited, though.

I plan to push this into master right at the beginning of January, and
then backpatch a couple days later.

I still feel a bit uneasy about tweaking this, but I don't think there's
a better way than reusing the existing flags.


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
Attachment

Re: Missing update of all_hasnulls in BRIN opclasses

From
Justin Pryzby
Date:
On Fri, Dec 30, 2022 at 01:18:36AM +0100, Tomas Vondra wrote:
> +         * Does the range already has NULL values? Either of the flags can

should say: "already have NULL values"

> +         * If we had NULLS, and the opclass didn't set allnulls=true, set
> +         * the hasnulls so that we know there are NULL values.

You could remove "the" before "hasnulls".
Or say "clear hasnulls so that.."

> @@ -585,6 +587,13 @@ brin_deform_tuple(BrinDesc *brdesc, BrinTuple *tuple, BrinMemTuple *dMemtuple)
>      {
>          int            i;
>  
> +        /*
> +         * Make sure to overwrite the hasnulls flag, because it was initialized
> +         * to true by brin_memtuple_initialize and we don't want to skip it if
> +         * allnulls.

Does "if allnulls" mean "if allnulls is true" ?
It's a bit unclear.

> +         */
> +        dtup->bt_columns[keyno].bv_hasnulls = hasnulls[keyno];
> +
>          if (allnulls[keyno])
>          {
>              valueno += brdesc->bd_info[keyno]->oi_nstored;

-- 
Justin



Re: Missing update of all_hasnulls in BRIN opclasses

From
Tomas Vondra
Date:
Thanks Justin! I've applied all the fixes you proposed, and (hopefully)
improved a couple other comments.

I've been working on this over the past couple days, trying to polish
and commit it over the weekend - both into master and backbranches.
Sadly, the backpatching part turned out to be a bit more complicated
than I expected, because of the BRIN reworks in PG14 (done by me, as
foundation for the new opclasses, so ... well).

Anyway, I got it done, but it's a bit uglier than I hoped for and I
don't feel like pushing this on Sunday midnight. I think it's correct,
but maybe another pass to polish it a bit more is better.

So here are two patches - one for 11-13, the other for 14-master.

There's also a separate patch with pageinspect tests, but only as a
demonstration of various (non)broken cases, not for commit. And then
also a bash script generating indexes with random data, randomized
summarization etc. - on unpatched systems this happens to fail in about
1/3 of the runs (at least for me). I haven't seen any failures with the
patches attached (on any branch).

As for the issue / fix, I don't think there's a better solution than
what the patch does - we need to distinguish empty / all-nulls ranges,
but we can't add a flag because of on-disk format / ABI. So using the
existing flags seems like the only option - I haven't heard any other
ideas so far, and I couldn't come up with any myself either.

I've also thought about alternative "encodings" into allnulls/hasnulls,
instead of treating (true,true) as "empty" - but none of that ended up
being any simpler, quite the opposite actually, as it would change what
the individual flags mean etc. So AFAICS this is the best / least
disruptive option.

I went over all the places touching these flags, to double check if any
of those needs some tweaks (similar to union_tuples, which I missed for
a long time). But I haven't found anything else, so I think this version
of the patches is complete.

As for assessing how many indexes are affected - in principle, any index
on columns with NULLs may be broken. But it only matters if the index is
used for IS NULL queries, other queries are not affected.

I also realized that this only affects insertion of individual tuples
into existing all-null summaries, not "bulk" summarization that sees all
values at once. This happens because in this case add_values_to_range
sets hasnulls=true for the first (NULL) value, and then calls the
addValue procedure for the second (non-NULL) one, which resets the
allnulls flag to false.

But when inserting individual rows, we first set hasnulls=true, but
brin_form_tuple ignores that because of allnulls=true. And then when
inserting the second row, we start with hasnulls=false again, and the
opclass quietly resets the allnulls flag.

I guess this further reduces the number of broken indexes, especially
for data sets with small null_frac, or for append-only (or -mostly)
tables where most of the summarization is bulk.

I still feel a bit uneasy about this, but I think the patch is solid.


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
Attachment

Re: Missing update of all_hasnulls in BRIN opclasses

From
Tomas Vondra
Date:

On 1/9/23 00:34, Tomas Vondra wrote:
> 
> I've been working on this over the past couple days, trying to polish
> and commit it over the weekend - both into master and backbranches.
> Sadly, the backpatching part turned out to be a bit more complicated
> than I expected, because of the BRIN reworks in PG14 (done by me, as
> foundation for the new opclasses, so ... well).
> 
> Anyway, I got it done, but it's a bit uglier than I hoped for and I
> don't feel like pushing this on Sunday midnight. I think it's correct,
> but maybe another pass to polish it a bit more is better.
> 
> So here are two patches - one for 11-13, the other for 14-master.
> 

I spent a bit more time on this fix. I realized there are two more
places that need fixes.

Firstly, the placeholder tuple needs to be marked as "empty" too, so
that it can be correctly updated by other backends etc.

Secondly, union_tuples had a couple bugs in handling empty ranges (this
is related to the placeholder tuple changes). I wonder what's the best
way to test this in an automated way - it's very dependent on timing of
the concurrent updated. For example we need to do something like this:

    T1: run pg_summarize_range() until it inserts the placeholder tuple
    T2: do an insert into the page range (updates placeholder)
    T1: continue pg_summarize_range() to merge into the placeholder

But there are no convenient ways to do this, I think. I had to check the
various cases using breakpoints in gdb etc.

I'm not very happy with the union_tuples() changes - it's quite verbose,
perhaps a bit too verbose. We have to check for empty ranges first, and
then various combinations of allnulls/hasnulls flags for both BRIN
tuples. There are 9 combinations, and the current code just checks them
one by one - I was getting repeatedly confused by the original code, but
maybe it's too much.

As for the backpatch, I tried to keep it as close to the 14+ fixes as
possible, but it effectively backports some of the 14+ BRIN changes. In
particular, 14+ moved most of the NULL-handling logic from opclasses to
brin.c, and I think it's reasonable to do that for the backbranches too.

The alternative is to apply the same fix to every BRIN_PROCNUM_UNION
opclass procedure out there. I guess doing that for minmax+inclusion is
not a huge deal, but what about external opclasses? And without the fix
the indexes are effectively broken. Fixing this outside in brin.c (in
the union procedure) fixes this for every opclass procedure, without any
actual limitation of functinality (14+ does that anyway).

But maybe someone thinks this is a bad idea and we should do something
else in the backbranches?

regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
Attachment

Re: Missing update of all_hasnulls in BRIN opclasses

From
Alvaro Herrera
Date:
Thanks for doing all this.  (Do I understand correctly that this patch
is not in the commitfest?)

I think my mental model for this was that "allnulls" meant that either
there are no values for the column in question or that the values were
all nulls (For minmax without NULL handling, which is where this all
started, these two things are essentially the same: the range is not to
be returned.  So this became a bug the instant I added handling for NULL
values.)  I failed to realize that these were two different things, and
this is likely the origin of all these troubles.

What do you think of using the unused bit in BrinTuple->bt_info to
denote a range that contains no heap tuples?  This also means we need it
in BrinMemTuple, I think we can do this:

@@ -44,6 +44,7 @@ typedef struct BrinValues
 typedef struct BrinMemTuple
 {
     bool        bt_placeholder; /* this is a placeholder tuple */
+    bool        bt_empty_range;    /* range has no tuples */
     BlockNumber bt_blkno;        /* heap blkno that the tuple is for */
     MemoryContext bt_context;    /* memcxt holding the bt_columns values */
     /* output arrays for brin_deform_tuple: */
@@ -69,7 +70,7 @@ typedef struct BrinTuple
      *
      * 7th (high) bit: has nulls
      * 6th bit: is placeholder tuple
-     * 5th bit: unused
+     * 5th bit: range has no tuples
      * 4-0 bit: offset of data
      * ---------------
      */
@@ -82,7 +83,7 @@ typedef struct BrinTuple
  * bt_info manipulation macros
  */
 #define BRIN_OFFSET_MASK        0x1F
-/* bit 0x20 is not used at present */
+#define BRIN_EMPTY_RANGE        0x20
 #define BRIN_PLACEHOLDER_MASK    0x40
 #define BRIN_NULLS_MASK            0x80
 
(Note that bt_empty_range uses a hole in the struct, so there's no ABI
change.)

This is BRIN-tuple-level, not column-level, so conceptually it seems
more appropriate.  (In the case where both are empty in union_tuples, we
can return without entering the per-attribute loop at all, though I
admit it's not a very interesting case.)  This approach avoids having to
invent the strange combination of all+has to mean empty.


On 2023-Feb-24, Tomas Vondra wrote:

> I wonder what's the best
> way to test this in an automated way - it's very dependent on timing of
> the concurrent updated. For example we need to do something like this:
> 
>     T1: run pg_summarize_range() until it inserts the placeholder tuple
>     T2: do an insert into the page range (updates placeholder)
>     T1: continue pg_summarize_range() to merge into the placeholder
> 
> But there are no convenient ways to do this, I think. I had to check the
> various cases using breakpoints in gdb etc.

Yeah, I struggled with this during initial development but in the end
did nothing.  I think we would need to introduce some new framework,
perhaps Korotkov stop-events stuff at 
https://postgr.es/m/CAPpHfdsTeb+hBT5=qxghjNG_cHcJLDaNQ9sdy9vNwBF2E2PuZA@mail.gmail.com
which seemed to me a good fit -- we would add a stop point after the
placeholder tuple is inserted.

> I'm not very happy with the union_tuples() changes - it's quite verbose,
> perhaps a bit too verbose. We have to check for empty ranges first, and
> then various combinations of allnulls/hasnulls flags for both BRIN
> tuples. There are 9 combinations, and the current code just checks them
> one by one - I was getting repeatedly confused by the original code, but
> maybe it's too much.

I think it's okay.  I tried to make it more compact (by saying "these
two combinations here are case 2, and these two other are case 4", and
keeping each of the other combinations a separate case; so there are
really 7 cases).  But that doesn't make it any easier to follow, on the
contrary it was more convoluted.  I think a dozen extra lines of source
is not a problem.

> The alternative is to apply the same fix to every BRIN_PROCNUM_UNION
> opclass procedure out there. I guess doing that for minmax+inclusion is
> not a huge deal, but what about external opclasses? And without the fix
> the indexes are effectively broken. Fixing this outside in brin.c (in
> the union procedure) fixes this for every opclass procedure, without any
> actual limitation of functinality (14+ does that anyway).

About the hypothetical question, you could as well ask what about
unicorns.  I have never seen any hint that any external opclass exist.
I am all for maintaining compatibility, but I think this concern is
overblown for BRIN.  Anyway, I think your proposed fix is better than
changing individual 'union' support procs, so it doesn't matter.

As far as I understood, you're now worried that there will be an
incompatibility because we will fail to call the 'union' procedure in
cases where we previously called it?  In other words, you fear that some
hypothetical opclass was handling the NULL values in some way that's
incompatible with this?  I haven't thought terribly hard about this, but
I can't see a way for this to cause incompatibilities.

> But maybe someone thinks this is a bad idea and we should do something
> else in the backbranches?

I think the new handling of NULLs in commit 72ccf55cb99c ("Move IS [NOT]
NULL handling from BRIN support functions") is better than what was
there before, so I don't object to backpatching it now that we know it's
necessary to fix a bug, and also we have field experience that the
approach is solid.

The attached patch is just a pointer to comments that I think need light
edition.  There's also a typo "bot" (for "both") in a comment that I
think would go away if you accept my suggestion to store 'empty' at the
tuple level.  Note that I worked with the REL_14_STABLE sources, because
for some reason I thought that that was the newest that needed
backpatching of 72ccf55cb99c, but now that I'm finishing this email I
realize that I should have used 13 instead /facepalm

-- 
Álvaro Herrera         PostgreSQL Developer  —  https://www.EnterpriseDB.com/
"La persona que no quería pecar / estaba obligada a sentarse
 en duras y empinadas sillas    / desprovistas, por cierto
 de blandos atenuantes"                          (Patricio Vogel)

Attachment

Re: Missing update of all_hasnulls in BRIN opclasses

From
Tomas Vondra
Date:

On 3/3/23 11:32, Alvaro Herrera wrote:
> 
> Thanks for doing all this.  (Do I understand correctly that this patch
> is not in the commitfest?)
> 
> I think my mental model for this was that "allnulls" meant that either
> there are no values for the column in question or that the values were
> all nulls (For minmax without NULL handling, which is where this all
> started, these two things are essentially the same: the range is not to
> be returned.  So this became a bug the instant I added handling for NULL
> values.)  I failed to realize that these were two different things, and
> this is likely the origin of all these troubles.
> 
> What do you think of using the unused bit in BrinTuple->bt_info to
> denote a range that contains no heap tuples?  This also means we need it
> in BrinMemTuple, I think we can do this:
> 
> @@ -44,6 +44,7 @@ typedef struct BrinValues
>  typedef struct BrinMemTuple
>  {
>      bool        bt_placeholder; /* this is a placeholder tuple */
> +    bool        bt_empty_range;    /* range has no tuples */
>      BlockNumber bt_blkno;        /* heap blkno that the tuple is for */
>      MemoryContext bt_context;    /* memcxt holding the bt_columns values */
>      /* output arrays for brin_deform_tuple: */
> @@ -69,7 +70,7 @@ typedef struct BrinTuple
>       *
>       * 7th (high) bit: has nulls
>       * 6th bit: is placeholder tuple
> -     * 5th bit: unused
> +     * 5th bit: range has no tuples
>       * 4-0 bit: offset of data
>       * ---------------
>       */
> @@ -82,7 +83,7 @@ typedef struct BrinTuple
>   * bt_info manipulation macros
>   */
>  #define BRIN_OFFSET_MASK        0x1F
> -/* bit 0x20 is not used at present */
> +#define BRIN_EMPTY_RANGE        0x20
>  #define BRIN_PLACEHOLDER_MASK    0x40
>  #define BRIN_NULLS_MASK            0x80
>  
> (Note that bt_empty_range uses a hole in the struct, so there's no ABI
> change.)
> 
> This is BRIN-tuple-level, not column-level, so conceptually it seems
> more appropriate.  (In the case where both are empty in union_tuples, we
> can return without entering the per-attribute loop at all, though I
> admit it's not a very interesting case.)  This approach avoids having to
> invent the strange combination of all+has to mean empty.
> 

Oh, that's an interesting idea! I haven't realized there's an unused bit
at the tuple level, and I absolutely agree it'd be a better match than
having this in individual summaries (like now).

It'd mean we'd not have the option to fix this withing the opclasses,
because we only pass them the BrinValue and not the tuple. But if you
think that's reasonable, that'd be OK.

The other thing I was unsure is if the bit could be set for any existing
tuples, but AFAICS that shouldn't be possible - brin_form_tuple does
palloc0, so it should be 0.

I suspect doing this might make the patch quite a bit simpler, actually.

> 
> On 2023-Feb-24, Tomas Vondra wrote:
> 
>> I wonder what's the best
>> way to test this in an automated way - it's very dependent on timing of
>> the concurrent updated. For example we need to do something like this:
>>
>>     T1: run pg_summarize_range() until it inserts the placeholder tuple
>>     T2: do an insert into the page range (updates placeholder)
>>     T1: continue pg_summarize_range() to merge into the placeholder
>>
>> But there are no convenient ways to do this, I think. I had to check the
>> various cases using breakpoints in gdb etc.
> 
> Yeah, I struggled with this during initial development but in the end
> did nothing.  I think we would need to introduce some new framework,
> perhaps Korotkov stop-events stuff at 
> https://postgr.es/m/CAPpHfdsTeb+hBT5=qxghjNG_cHcJLDaNQ9sdy9vNwBF2E2PuZA@mail.gmail.com
> which seemed to me a good fit -- we would add a stop point after the
> placeholder tuple is inserted.
> 

Yeah, but we don't have that at the moment. I actually ended up adding a
couple sleeps during development, which allowed me to hit just the right
order of operations - a poor-man's version of those stop-events. Did
work but hardly an acceptable approach.

>> I'm not very happy with the union_tuples() changes - it's quite verbose,
>> perhaps a bit too verbose. We have to check for empty ranges first, and
>> then various combinations of allnulls/hasnulls flags for both BRIN
>> tuples. There are 9 combinations, and the current code just checks them
>> one by one - I was getting repeatedly confused by the original code, but
>> maybe it's too much.
> 
> I think it's okay.  I tried to make it more compact (by saying "these
> two combinations here are case 2, and these two other are case 4", and
> keeping each of the other combinations a separate case; so there are
> really 7 cases).  But that doesn't make it any easier to follow, on the
> contrary it was more convoluted.  I think a dozen extra lines of source
> is not a problem.
> 

OK

>> The alternative is to apply the same fix to every BRIN_PROCNUM_UNION
>> opclass procedure out there. I guess doing that for minmax+inclusion is
>> not a huge deal, but what about external opclasses? And without the fix
>> the indexes are effectively broken. Fixing this outside in brin.c (in
>> the union procedure) fixes this for every opclass procedure, without any
>> actual limitation of functinality (14+ does that anyway).
> 
> About the hypothetical question, you could as well ask what about
> unicorns.  I have never seen any hint that any external opclass exist.
> I am all for maintaining compatibility, but I think this concern is
> overblown for BRIN.  Anyway, I think your proposed fix is better than
> changing individual 'union' support procs, so it doesn't matter.
> 

OK

> As far as I understood, you're now worried that there will be an
> incompatibility because we will fail to call the 'union' procedure in
> cases where we previously called it?  In other words, you fear that some
> hypothetical opclass was handling the NULL values in some way that's
> incompatible with this?  I haven't thought terribly hard about this, but
> I can't see a way for this to cause incompatibilities.
> 

Yeah, the possible incompatibility is one concern - I have a hard time
imagining such an opclass, because it'd have to handle NULLs in some
strange way. But and as you noted, we're not aware of any external BRIN
opclasses, so maybe this is OK.

The other concern is more generic - as I mentioned, moving the NULL
handling from opclasses to brin.c is what we did in PG14, so this feels
a bit like a backport, and I dislike that a little bit.

>> But maybe someone thinks this is a bad idea and we should do something
>> else in the backbranches?
> 
> I think the new handling of NULLs in commit 72ccf55cb99c ("Move IS [NOT]
> NULL handling from BRIN support functions") is better than what was
> there before, so I don't object to backpatching it now that we know it's
> necessary to fix a bug, and also we have field experience that the
> approach is solid.
> 

OK, good to hear.

> The attached patch is just a pointer to comments that I think need light
> edition.  There's also a typo "bot" (for "both") in a comment that I
> think would go away if you accept my suggestion to store 'empty' at the
> tuple level.  Note that I worked with the REL_14_STABLE sources, because
> for some reason I thought that that was the newest that needed
> backpatching of 72ccf55cb99c, but now that I'm finishing this email I
> realize that I should have used 13 instead /facepalm
> 

Thanks. I'll try to rework the patches to use the bt_info unused bit,
and report back in a week or two.


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Missing update of all_hasnulls in BRIN opclasses

From
Tomas Vondra
Date:
Hi,

It took me a while but I finally got back to reworking this to use the
bt_info bit, as proposed by Alvaro. And it turned out to work great,
because (a) it's a tuple-level flag, i.e. the right place, and (b) it
does not overload existing flags.

This greatly simplified the code in add_values_to_range and (especially)
union_tuples, making it much easier to understand, I think.

One disadvantage is we are unable to see which ranges are empty in
current pageinspect, but 0002 addresses that by adding "empty" column to
the brin_page_items() output. That's a matter for master only, though.
It's a trivial patch and it makes it easier/possible to test this, so we
should consider to squeeze it into PG16.

I did quite a bit of testing - the attached 0003 adds extra tests, but I
don't propose to get this committed as is - it's rather overkill. Maybe
some reduced version of it ...

The hardest thing to test is the union_tuples() part, as it requires
concurrent operations with "correct" timing. Easy to simulate by
breakpoints in GDB, not so much in plain regression/TAP tests.

There's also a stress tests, doing a lot of randomized summarizations,
etc. Without the fix this failed in maybe 30% of runs, now I did ~100
runs without a single failure.

I haven't done any backporting, but I think it should be simpler than
with the earlier approach. I wonder if we need to care about starting to
use the previously unused bit - I don't think so, in the worst case
we'll just ignore it, but maybe I'm missing something (e.g. when using
physical replication).


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
Attachment

Re: Missing update of all_hasnulls in BRIN opclasses

From
Tomas Vondra
Date:
Hi,

here's an updated version of the patch, including a backport version. I
ended up making the code yet a bit closer to master by introducing
add_values_to_range(). The current pre-14 code has the loop adding data
to the BRIN tuple in two places, but with the "fixed" logic handling
NULLs and the empty_range flag the amount of duplicated code got too
high, so this seem reasonable.

Both cases have a patch extending pageinspect to show the new flag, but
obviously we should commit that only in master. In backbranches it's
meant only to make testing easier.

I plan to do a bit more testing, I'd welcome some feedback - it's a
long-standing bug, and it'd be good to finally get this fixed. I don't
think the patch can be made any simpler.


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
Attachment

Re: Missing update of all_hasnulls in BRIN opclasses

From
Alvaro Herrera
Date:
On 2023-Apr-23, Tomas Vondra wrote:

> here's an updated version of the patch, including a backport version. I
> ended up making the code yet a bit closer to master by introducing
> add_values_to_range(). The current pre-14 code has the loop adding data
> to the BRIN tuple in two places, but with the "fixed" logic handling
> NULLs and the empty_range flag the amount of duplicated code got too
> high, so this seem reasonable.

In backbranches, the new field to BrinMemTuple needs to be at the end of
the struct, to avoid ABI breakage.

There's a comment in add_values_to_range with a typo "If the range was had".
Also, "So we should not see empty range that was not modified" should
perhaps be "should not see an empty range".

(As for your FIXME comment in brin_memtuple_initialize, I think you're
correct: we definitely need to reset bt_placeholder.  Otherwise, we risk
places that call it in a loop using a BrinMemTuple with one range with
the flag set, in a range where that doesn't hold.  It might be
impossible for this to happen, given how narrow the conditions are on
which bt_placeholder is used; but it seems safer to reset it anyway.)

Some pgindent noise would be induced by this patch.  I think it's worth
cleaning up ahead of time.

I did a quick experiment of turning the patches over -- applying test
first, code fix after (requires some conflict fixing).  With that I
verified that the behavior of 'hasnulls' indeed changes with the code
fix.

> Both cases have a patch extending pageinspect to show the new flag, but
> obviously we should commit that only in master. In backbranches it's
> meant only to make testing easier.

In backbranches, I think it should be reasonably easy to add a
--1.7--1.7.1.sql file and set the default version to 1.7.1; that then
enables us to have the functionality (and the tests) in older branches
too.  If you just add a --1.X.1--1.12.sql version to each branch that's
identical to that branch's current pageinspect version upgrade script,
that would let us have working upgrade paths for all branches.  This is
a bit laborious but straightforward enough.

If you don't feel like adding that, I volunteer to add the necessary
scripts to all branches after you commit the bugfix, and ensure that all
upgrade paths work correctly.

> I plan to do a bit more testing, I'd welcome some feedback - it's a
> long-standing bug, and it'd be good to finally get this fixed. I don't
> think the patch can be made any simpler.

The approach looks good to me.

-- 
Álvaro Herrera               48°01'N 7°57'E  —  https://www.EnterpriseDB.com/
"Oh, great altar of passive entertainment, bestow upon me thy discordant images
at such speed as to render linear thought impossible" (Calvin a la TV)



Re: Missing update of all_hasnulls in BRIN opclasses

From
Tom Lane
Date:
Alvaro Herrera <alvherre@alvh.no-ip.org> writes:
> On 2023-Apr-23, Tomas Vondra wrote:
>> Both cases have a patch extending pageinspect to show the new flag, but
>> obviously we should commit that only in master. In backbranches it's
>> meant only to make testing easier.

> In backbranches, I think it should be reasonably easy to add a
> --1.7--1.7.1.sql file and set the default version to 1.7.1; that then
> enables us to have the functionality (and the tests) in older branches
> too.  If you just add a --1.X.1--1.12.sql version to each branch that's
> identical to that branch's current pageinspect version upgrade script,
> that would let us have working upgrade paths for all branches.  This is
> a bit laborious but straightforward enough.

"A bit laborious"?  That seems enormously out of proportion to the
benefit of putting this test case into back branches.  It will have
costs for end users too, not only us.  As an example, it would
unecessarily block some upgrade paths, if the upgraded-to installation
is slightly older and lacks the necessary --1.X.1--1.12 script.

> If you don't feel like adding that, I volunteer to add the necessary
> scripts to all branches after you commit the bugfix, and ensure that all
> upgrade paths work correctly.

I do not think this should happen at all, whether you're willing to
do the work or not.

            regards, tom lane



Re: Missing update of all_hasnulls in BRIN opclasses

From
Tomas Vondra
Date:

On 4/24/23 17:46, Tom Lane wrote:
> Alvaro Herrera <alvherre@alvh.no-ip.org> writes:
>> On 2023-Apr-23, Tomas Vondra wrote:
>>> Both cases have a patch extending pageinspect to show the new flag, but
>>> obviously we should commit that only in master. In backbranches it's
>>> meant only to make testing easier.
> 
>> In backbranches, I think it should be reasonably easy to add a
>> --1.7--1.7.1.sql file and set the default version to 1.7.1; that then
>> enables us to have the functionality (and the tests) in older branches
>> too.  If you just add a --1.X.1--1.12.sql version to each branch that's
>> identical to that branch's current pageinspect version upgrade script,
>> that would let us have working upgrade paths for all branches.  This is
>> a bit laborious but straightforward enough.
> 
> "A bit laborious"?  That seems enormously out of proportion to the
> benefit of putting this test case into back branches.  It will have
> costs for end users too, not only us.  As an example, it would
> unecessarily block some upgrade paths, if the upgraded-to installation
> is slightly older and lacks the necessary --1.X.1--1.12 script.
> 

Why would that block the upgrade? Presumably we'd add two upgrade
scripts in the master, to allow upgrade both from 1.X and 1.X.1.

>> If you don't feel like adding that, I volunteer to add the necessary
>> scripts to all branches after you commit the bugfix, and ensure that all
>> upgrade paths work correctly.
> 
> I do not think this should happen at all, whether you're willing to
> do the work or not.

FWIW I'm fine with not doing that. As mentioned, I only included this
patch to make testing the patch easier (otherwise the flag is impossible
to inspect directly).

regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Missing update of all_hasnulls in BRIN opclasses

From
Tom Lane
Date:
Tomas Vondra <tomas.vondra@enterprisedb.com> writes:
> On 4/24/23 17:46, Tom Lane wrote:
>> "A bit laborious"?  That seems enormously out of proportion to the
>> benefit of putting this test case into back branches.  It will have
>> costs for end users too, not only us.  As an example, it would
>> unecessarily block some upgrade paths, if the upgraded-to installation
>> is slightly older and lacks the necessary --1.X.1--1.12 script.

> Why would that block the upgrade? Presumably we'd add two upgrade
> scripts in the master, to allow upgrade both from 1.X and 1.X.1.

It would for example block updating from 14.8 or later to 15.2, since
a 15.2 installation would not have the script to update from 1.X.1.

Yeah, people could work around that by only installing the latest
version, but there are plenty of real-world scenarios where you'd be
creating friction, or at least confusion.  I do not think that this
test case is worth it.

            regards, tom lane



Re: Missing update of all_hasnulls in BRIN opclasses

From
Tomas Vondra
Date:

On 4/24/23 23:05, Tomas Vondra wrote:
> 
> 
> On 4/24/23 17:46, Tom Lane wrote:
>> Alvaro Herrera <alvherre@alvh.no-ip.org> writes:
>>> On 2023-Apr-23, Tomas Vondra wrote:
>>>> Both cases have a patch extending pageinspect to show the new flag, but
>>>> obviously we should commit that only in master. In backbranches it's
>>>> meant only to make testing easier.
>>
>>> In backbranches, I think it should be reasonably easy to add a
>>> --1.7--1.7.1.sql file and set the default version to 1.7.1; that then
>>> enables us to have the functionality (and the tests) in older branches
>>> too.  If you just add a --1.X.1--1.12.sql version to each branch that's
>>> identical to that branch's current pageinspect version upgrade script,
>>> that would let us have working upgrade paths for all branches.  This is
>>> a bit laborious but straightforward enough.
>>
>> "A bit laborious"?  That seems enormously out of proportion to the
>> benefit of putting this test case into back branches.  It will have
>> costs for end users too, not only us.  As an example, it would
>> unecessarily block some upgrade paths, if the upgraded-to installation
>> is slightly older and lacks the necessary --1.X.1--1.12 script.
>>
> 
> Why would that block the upgrade? Presumably we'd add two upgrade
> scripts in the master, to allow upgrade both from 1.X and 1.X.1.
> 

D'oh! I just realized I misunderstood the issue. Yes, if the target
version is missing the new script, that won't work. I'm not sure how
likely that is - in my experience people refresh versions at the same
time - but it's certainly an assumption we shouldn't do, I guess.

regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Missing update of all_hasnulls in BRIN opclasses

From
Tomas Vondra
Date:
On 4/24/23 17:36, Alvaro Herrera wrote:
> On 2023-Apr-23, Tomas Vondra wrote:
> 
>> here's an updated version of the patch, including a backport version. I
>> ended up making the code yet a bit closer to master by introducing
>> add_values_to_range(). The current pre-14 code has the loop adding data
>> to the BRIN tuple in two places, but with the "fixed" logic handling
>> NULLs and the empty_range flag the amount of duplicated code got too
>> high, so this seem reasonable.
> 
> In backbranches, the new field to BrinMemTuple needs to be at the end of
> the struct, to avoid ABI breakage.
> 

Good point.

> There's a comment in add_values_to_range with a typo "If the range was had".
> Also, "So we should not see empty range that was not modified" should
> perhaps be "should not see an empty range".
> 

OK, I'll check the wording of the comments.

> (As for your FIXME comment in brin_memtuple_initialize, I think you're
> correct: we definitely need to reset bt_placeholder.  Otherwise, we risk
> places that call it in a loop using a BrinMemTuple with one range with
> the flag set, in a range where that doesn't hold.  It might be
> impossible for this to happen, given how narrow the conditions are on
> which bt_placeholder is used; but it seems safer to reset it anyway.)
> 

Yeah. But isn't that a separate preexisting issue, strictly speaking?

> Some pgindent noise would be induced by this patch.  I think it's worth
> cleaning up ahead of time.
> 

True. Will do.

> I did a quick experiment of turning the patches over -- applying test
> first, code fix after (requires some conflict fixing).  With that I
> verified that the behavior of 'hasnulls' indeed changes with the code
> fix.
> 

Thanks. Could you do some testing of the union_tuples stuff too? It's a
bit tricky part - the behavior is timing sensitive, so testing it
requires gdb breakpoints breakpoints or something like that. I think
it's correct, but it'd be nice to check that.

regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Missing update of all_hasnulls in BRIN opclasses

From
Alvaro Herrera
Date:
On 2023-Apr-24, Tomas Vondra wrote:

> On 4/24/23 17:36, Alvaro Herrera wrote:

> > (As for your FIXME comment in brin_memtuple_initialize, I think you're
> > correct: we definitely need to reset bt_placeholder.  Otherwise, we risk
> > places that call it in a loop using a BrinMemTuple with one range with
> > the flag set, in a range where that doesn't hold.  It might be
> > impossible for this to happen, given how narrow the conditions are on
> > which bt_placeholder is used; but it seems safer to reset it anyway.)
> 
> Yeah. But isn't that a separate preexisting issue, strictly speaking?

Yes.

> > I did a quick experiment of turning the patches over -- applying test
> > first, code fix after (requires some conflict fixing).  With that I
> > verified that the behavior of 'hasnulls' indeed changes with the code
> > fix.
> 
> Thanks. Could you do some testing of the union_tuples stuff too? It's a
> bit tricky part - the behavior is timing sensitive, so testing it
> requires gdb breakpoints breakpoints or something like that. I think
> it's correct, but it'd be nice to check that.

I'll have a look.

-- 
Álvaro Herrera        Breisgau, Deutschland  —  https://www.EnterpriseDB.com/
<inflex> really, I see PHP as like a strange amalgamation of C, Perl, Shell
<crab> inflex: you know that "amalgam" means "mixture with mercury",
       more or less, right?
<crab> i.e., "deadly poison"



Re: Missing update of all_hasnulls in BRIN opclasses

From
Tomas Vondra
Date:

On 4/24/23 23:20, Tomas Vondra wrote:
> On 4/24/23 17:36, Alvaro Herrera wrote:
>> On 2023-Apr-23, Tomas Vondra wrote:
>>
>>> here's an updated version of the patch, including a backport version. I
>>> ended up making the code yet a bit closer to master by introducing
>>> add_values_to_range(). The current pre-14 code has the loop adding data
>>> to the BRIN tuple in two places, but with the "fixed" logic handling
>>> NULLs and the empty_range flag the amount of duplicated code got too
>>> high, so this seem reasonable.
>>
>> In backbranches, the new field to BrinMemTuple needs to be at the end of
>> the struct, to avoid ABI breakage.
>>

Unfortunately, this is not actually possible :-(

The BrinMemTuple has a FLEXIBLE_ARRAY_MEMBER at the end, so we can't
place anything after it. I think we have three options:

a) some other approach? - I really can't see any, except maybe for going
back to the previous approach (i.e. encoding the info using the existing
BrinValues allnulls/hasnulls flags)

b) encoding the info in existing BrinMemTuple flags - e.g. we could use
bt_placeholder to store two bits, not just one. Seems a bit ugly.

c) ignore the issue - AFAICS this would be an issue only for (external)
code accessing BrinMemTuple structs, but I don't think we're aware of
any out-of-core BRIN opclasses or anything like that ...


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Missing update of all_hasnulls in BRIN opclasses

From
Julien Rouhaud
Date:
Hi,

On Sun, May 07, 2023 at 12:13:07AM +0200, Tomas Vondra wrote:
>
> c) ignore the issue - AFAICS this would be an issue only for (external)
> code accessing BrinMemTuple structs, but I don't think we're aware of
> any out-of-core BRIN opclasses or anything like that ...

FTR there's at least postgis that implments BRIN opclasses (for geometries and
geographies), but there's nothing fancy in the implementation and it doesn't
access BrinMemTuple struct.



Re: Missing update of all_hasnulls in BRIN opclasses

From
Tomas Vondra
Date:

On 5/7/23 07:08, Julien Rouhaud wrote:
> Hi,
> 
> On Sun, May 07, 2023 at 12:13:07AM +0200, Tomas Vondra wrote:
>>
>> c) ignore the issue - AFAICS this would be an issue only for (external)
>> code accessing BrinMemTuple structs, but I don't think we're aware of
>> any out-of-core BRIN opclasses or anything like that ...
> 
> FTR there's at least postgis that implments BRIN opclasses (for geometries and
> geographies), but there's nothing fancy in the implementation and it doesn't
> access BrinMemTuple struct.

Right. I believe that should be fine, because opclasses don't access the
tuple directly - instead we pass pointers to individual pieces. But
maybe it'd be a good idea to test this.


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Missing update of all_hasnulls in BRIN opclasses

From
Alvaro Herrera
Date:
On 2023-May-07, Tomas Vondra wrote:

> > Álvaro wrote:
> >> In backbranches, the new field to BrinMemTuple needs to be at the end of
> >> the struct, to avoid ABI breakage.
> 
> Unfortunately, this is not actually possible :-(
> 
> The BrinMemTuple has a FLEXIBLE_ARRAY_MEMBER at the end, so we can't
> place anything after it. I think we have three options:
> 
> a) some other approach? - I really can't see any, except maybe for going
> back to the previous approach (i.e. encoding the info using the existing
> BrinValues allnulls/hasnulls flags)

Actually, mine was quite the stupid suggestion: the BrinMemTuple already
has a 3 byte hole in the place where you originally wanted to add the
flag:

struct BrinMemTuple {
    _Bool                      bt_placeholder;       /*     0     1 */

    /* XXX 3 bytes hole, try to pack */

    BlockNumber                bt_blkno;             /*     4     4 */
    MemoryContext              bt_context;           /*     8     8 */
    Datum *                    bt_values;            /*    16     8 */
    _Bool *                    bt_allnulls;          /*    24     8 */
    _Bool *                    bt_hasnulls;          /*    32     8 */
    BrinValues                 bt_columns[];         /*    40     0 */

    /* size: 40, cachelines: 1, members: 7 */
    /* sum members: 37, holes: 1, sum holes: 3 */
    /* last cacheline: 40 bytes */
};

so putting it there was already not causing any ABI breakage.  So, the
solution to this problem of not being able to put it at the end is just
to return the struct to your original formulation.

-- 
Álvaro Herrera               48°01'N 7°57'E  —  https://www.EnterpriseDB.com/
"La primera ley de las demostraciones en vivo es: no trate de usar el sistema.
Escriba un guión que no toque nada para no causar daños." (Jakob Nielsen)



Re: Missing update of all_hasnulls in BRIN opclasses

From
Tomas Vondra
Date:
On 5/15/23 12:06, Alvaro Herrera wrote:
> On 2023-May-07, Tomas Vondra wrote:
> 
>>> Álvaro wrote:
>>>> In backbranches, the new field to BrinMemTuple needs to be at the end of
>>>> the struct, to avoid ABI breakage.
>>
>> Unfortunately, this is not actually possible :-(
>>
>> The BrinMemTuple has a FLEXIBLE_ARRAY_MEMBER at the end, so we can't
>> place anything after it. I think we have three options:
>>
>> a) some other approach? - I really can't see any, except maybe for going
>> back to the previous approach (i.e. encoding the info using the existing
>> BrinValues allnulls/hasnulls flags)
> 
> Actually, mine was quite the stupid suggestion: the BrinMemTuple already
> has a 3 byte hole in the place where you originally wanted to add the
> flag:
> 
> struct BrinMemTuple {
>     _Bool                      bt_placeholder;       /*     0     1 */
> 
>     /* XXX 3 bytes hole, try to pack */
> 
>     BlockNumber                bt_blkno;             /*     4     4 */
>     MemoryContext              bt_context;           /*     8     8 */
>     Datum *                    bt_values;            /*    16     8 */
>     _Bool *                    bt_allnulls;          /*    24     8 */
>     _Bool *                    bt_hasnulls;          /*    32     8 */
>     BrinValues                 bt_columns[];         /*    40     0 */
> 
>     /* size: 40, cachelines: 1, members: 7 */
>     /* sum members: 37, holes: 1, sum holes: 3 */
>     /* last cacheline: 40 bytes */
> };
> 
> so putting it there was already not causing any ABI breakage.  So, the
> solution to this problem of not being able to put it at the end is just
> to return the struct to your original formulation.
> 

Thanks, that's pretty lucky. It means we're not breaking on-disk format
nor ABI, which is great.

Attached is a final version of the patches - I intend to do a bit more
testing, go through the comments once more, and get this committed today
or perhaps tomorrow morning, so that it gets into beta1.

Unfortunately, while polishing the patches, I realized union_tuples()
has yet another long-standing bug with handling NULL values, because it
does this:

    /* Adjust "hasnulls". */
    if (!col_a->bv_hasnulls && col_b->bv_hasnulls)
        col_a->bv_hasnulls = true;

but let's assume "col_a" is a summary representing "1" and "col_b"
represents NULL (col_b->bv_hasnulls=false col_b->bv_allnulls=true).
Well, in that case we fail to "remember" col_a should represent NULL
values too :-(

This is somewhat separate issue, because it's unrelated to empty ranges
(neither of the two ranges is empty). It's hard to test it, because it
requires a particular timing of the concurrent actions, but a breakpoint
in brin.c on the brin_can_do_samepage_update call (in summarize_range)
does the trick for manual testing.

0001 fixes the issue. 0002 is the original fix, and 0003 is just the
pageinspect changes (for master only).

For the backbranches, I thought about making the code more like master
(by moving some of the handling from opclasses to brin.c), but decided
not to. It'd be low-risk, but it feels wrong to kinda do what the master
does under "oi_regular_nulls" flag.


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
Attachment

Re: Missing update of all_hasnulls in BRIN opclasses

From
Tomas Vondra
Date:
On 5/18/23 20:45, Tomas Vondra wrote:
> ...
>
> 0001 fixes the issue. 0002 is the original fix, and 0003 is just the
> pageinspect changes (for master only).
> 
> For the backbranches, I thought about making the code more like master
> (by moving some of the handling from opclasses to brin.c), but decided
> not to. It'd be low-risk, but it feels wrong to kinda do what the master
> does under "oi_regular_nulls" flag.
> 

I've now pushed all these patches into relevant branches, after some
minor last-minute tweaks, and so far it didn't cause any buildfarm
issues. Assuming this fully fixes the NULL-handling for BRIN, this
leaves just the deadlock issue discussed in [1].

It seems rather unfortunate all these issues went unnoticed / unreported
essentially since BRIN was introduced in 9.5. To some extent it might be
explained by fairly low likelihood of actually hitting the issue (just
the right timing, concurrency with summarization, NULL values, ...). It
took me quite a bit of time and luck to (accidentally) hit these issues
while stress testing the code.

But there's also the problem of writing tests for this kind of thing. To
exercise the interesting parts (e.g. the union_tuples), it's necessary
to coordinate the order of concurrent steps - but what's a good generic
way to do that (which we could do in TAP tests)? In manual testing it's
doable by setting breakpoints on a particular lines, and step through
the concurrent processes that way.

But that doesn't seem like a particularly great solution for regression
tests. I can imagine adding some sort of "probes" into the code and then
attaching breakpoints to those, but surely we're not the first project
needing this ...


regards

[1]
https://www.postgresql.org/message-id/261e68bc-f5f5-5234-fb2c-af4f583513c0@enterprisedb.com

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company