Re: Missing update of all_hasnulls in BRIN opclasses - Mailing list pgsql-hackers

From Tomas Vondra
Subject Re: Missing update of all_hasnulls in BRIN opclasses
Date
Msg-id a5988b9d-366a-6833-7235-277dadde2a16@enterprisedb.com
Whole thread Raw
In response to Re: Missing update of all_hasnulls in BRIN opclasses  (Alvaro Herrera <alvherre@alvh.no-ip.org>)
Responses Re: Missing update of all_hasnulls in BRIN opclasses  (Tomas Vondra <tomas.vondra@enterprisedb.com>)
List pgsql-hackers

On 3/3/23 11:32, Alvaro Herrera wrote:
> 
> Thanks for doing all this.  (Do I understand correctly that this patch
> is not in the commitfest?)
> 
> I think my mental model for this was that "allnulls" meant that either
> there are no values for the column in question or that the values were
> all nulls (For minmax without NULL handling, which is where this all
> started, these two things are essentially the same: the range is not to
> be returned.  So this became a bug the instant I added handling for NULL
> values.)  I failed to realize that these were two different things, and
> this is likely the origin of all these troubles.
> 
> What do you think of using the unused bit in BrinTuple->bt_info to
> denote a range that contains no heap tuples?  This also means we need it
> in BrinMemTuple, I think we can do this:
> 
> @@ -44,6 +44,7 @@ typedef struct BrinValues
>  typedef struct BrinMemTuple
>  {
>      bool        bt_placeholder; /* this is a placeholder tuple */
> +    bool        bt_empty_range;    /* range has no tuples */
>      BlockNumber bt_blkno;        /* heap blkno that the tuple is for */
>      MemoryContext bt_context;    /* memcxt holding the bt_columns values */
>      /* output arrays for brin_deform_tuple: */
> @@ -69,7 +70,7 @@ typedef struct BrinTuple
>       *
>       * 7th (high) bit: has nulls
>       * 6th bit: is placeholder tuple
> -     * 5th bit: unused
> +     * 5th bit: range has no tuples
>       * 4-0 bit: offset of data
>       * ---------------
>       */
> @@ -82,7 +83,7 @@ typedef struct BrinTuple
>   * bt_info manipulation macros
>   */
>  #define BRIN_OFFSET_MASK        0x1F
> -/* bit 0x20 is not used at present */
> +#define BRIN_EMPTY_RANGE        0x20
>  #define BRIN_PLACEHOLDER_MASK    0x40
>  #define BRIN_NULLS_MASK            0x80
>  
> (Note that bt_empty_range uses a hole in the struct, so there's no ABI
> change.)
> 
> This is BRIN-tuple-level, not column-level, so conceptually it seems
> more appropriate.  (In the case where both are empty in union_tuples, we
> can return without entering the per-attribute loop at all, though I
> admit it's not a very interesting case.)  This approach avoids having to
> invent the strange combination of all+has to mean empty.
> 

Oh, that's an interesting idea! I haven't realized there's an unused bit
at the tuple level, and I absolutely agree it'd be a better match than
having this in individual summaries (like now).

It'd mean we'd not have the option to fix this withing the opclasses,
because we only pass them the BrinValue and not the tuple. But if you
think that's reasonable, that'd be OK.

The other thing I was unsure is if the bit could be set for any existing
tuples, but AFAICS that shouldn't be possible - brin_form_tuple does
palloc0, so it should be 0.

I suspect doing this might make the patch quite a bit simpler, actually.

> 
> On 2023-Feb-24, Tomas Vondra wrote:
> 
>> I wonder what's the best
>> way to test this in an automated way - it's very dependent on timing of
>> the concurrent updated. For example we need to do something like this:
>>
>>     T1: run pg_summarize_range() until it inserts the placeholder tuple
>>     T2: do an insert into the page range (updates placeholder)
>>     T1: continue pg_summarize_range() to merge into the placeholder
>>
>> But there are no convenient ways to do this, I think. I had to check the
>> various cases using breakpoints in gdb etc.
> 
> Yeah, I struggled with this during initial development but in the end
> did nothing.  I think we would need to introduce some new framework,
> perhaps Korotkov stop-events stuff at 
> https://postgr.es/m/CAPpHfdsTeb+hBT5=qxghjNG_cHcJLDaNQ9sdy9vNwBF2E2PuZA@mail.gmail.com
> which seemed to me a good fit -- we would add a stop point after the
> placeholder tuple is inserted.
> 

Yeah, but we don't have that at the moment. I actually ended up adding a
couple sleeps during development, which allowed me to hit just the right
order of operations - a poor-man's version of those stop-events. Did
work but hardly an acceptable approach.

>> I'm not very happy with the union_tuples() changes - it's quite verbose,
>> perhaps a bit too verbose. We have to check for empty ranges first, and
>> then various combinations of allnulls/hasnulls flags for both BRIN
>> tuples. There are 9 combinations, and the current code just checks them
>> one by one - I was getting repeatedly confused by the original code, but
>> maybe it's too much.
> 
> I think it's okay.  I tried to make it more compact (by saying "these
> two combinations here are case 2, and these two other are case 4", and
> keeping each of the other combinations a separate case; so there are
> really 7 cases).  But that doesn't make it any easier to follow, on the
> contrary it was more convoluted.  I think a dozen extra lines of source
> is not a problem.
> 

OK

>> The alternative is to apply the same fix to every BRIN_PROCNUM_UNION
>> opclass procedure out there. I guess doing that for minmax+inclusion is
>> not a huge deal, but what about external opclasses? And without the fix
>> the indexes are effectively broken. Fixing this outside in brin.c (in
>> the union procedure) fixes this for every opclass procedure, without any
>> actual limitation of functinality (14+ does that anyway).
> 
> About the hypothetical question, you could as well ask what about
> unicorns.  I have never seen any hint that any external opclass exist.
> I am all for maintaining compatibility, but I think this concern is
> overblown for BRIN.  Anyway, I think your proposed fix is better than
> changing individual 'union' support procs, so it doesn't matter.
> 

OK

> As far as I understood, you're now worried that there will be an
> incompatibility because we will fail to call the 'union' procedure in
> cases where we previously called it?  In other words, you fear that some
> hypothetical opclass was handling the NULL values in some way that's
> incompatible with this?  I haven't thought terribly hard about this, but
> I can't see a way for this to cause incompatibilities.
> 

Yeah, the possible incompatibility is one concern - I have a hard time
imagining such an opclass, because it'd have to handle NULLs in some
strange way. But and as you noted, we're not aware of any external BRIN
opclasses, so maybe this is OK.

The other concern is more generic - as I mentioned, moving the NULL
handling from opclasses to brin.c is what we did in PG14, so this feels
a bit like a backport, and I dislike that a little bit.

>> But maybe someone thinks this is a bad idea and we should do something
>> else in the backbranches?
> 
> I think the new handling of NULLs in commit 72ccf55cb99c ("Move IS [NOT]
> NULL handling from BRIN support functions") is better than what was
> there before, so I don't object to backpatching it now that we know it's
> necessary to fix a bug, and also we have field experience that the
> approach is solid.
> 

OK, good to hear.

> The attached patch is just a pointer to comments that I think need light
> edition.  There's also a typo "bot" (for "both") in a comment that I
> think would go away if you accept my suggestion to store 'empty' at the
> tuple level.  Note that I worked with the REL_14_STABLE sources, because
> for some reason I thought that that was the newest that needed
> backpatching of 72ccf55cb99c, but now that I'm finishing this email I
> realize that I should have used 13 instead /facepalm
> 

Thanks. I'll try to rework the patches to use the bt_info unused bit,
and report back in a week or two.


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



pgsql-hackers by date:

Previous
From: David Rowley
Date:
Subject: Re: Allow ordered partition scans in more cases
Next
From: Dean Rasheed
Date:
Subject: Re: Add support for unit "B" to pg_size_pretty()