Re: Adding skip scan (including MDAM style range skip scan) to nbtree - Mailing list pgsql-hackers

From Alena Rybakina
Subject Re: Adding skip scan (including MDAM style range skip scan) to nbtree
Date
Msg-id 917c7895-32a8-4f1c-bf56-7fc99b60e7eb@postgrespro.ru
Whole thread Raw
In response to Re: Adding skip scan (including MDAM style range skip scan) to nbtree  (Peter Geoghegan <pg@bowt.ie>)
Responses Re: Adding skip scan (including MDAM style range skip scan) to nbtree
List pgsql-hackers
On 12.03.2025 01:59, Peter Geoghegan wrote:
> On Tue, Mar 11, 2025 at 6:24 PM Alena Rybakina
> <a.rybakina@postgrespro.ru> wrote:
>> Hi, reviewing the code I noticed that you removed the
>> parallel_aware check for DSM initialization for BitmapIndexScan,
>> IndexScan, IndexOnlyScan,
>> but you didn't do the same in the ExecParallelReInitializeDSM function
>> and I can't figure out why to be honest. I think it might be wrong or
>> I'm missing something.
> I didn't exactly remove the check -- not completely. You could say
> that I *moved* the check, from the caller (i.e. from functions in
> execParallel.c such as ExecParallelInitializeDSM) to the callee (i.e.
> into individual executor node functions such as
> ExecIndexScanInitializeDSM). I did it that way because it's more
> flexible.
>
> We need this flexibility because we need to allocate DSM for
> instrumentation state when EXPLAIN ANALYZE runs a parallel query with
> an index scan node -- even when the scan node runs inside a parallel
> worker, but is non-parallel-aware (parallel oblivious). Obviously, we
> might also need to allocate space for a shared index scan descriptor
> (including index AM opaque state), in the same way as before
> (or we might need to do both).
I see and agree with this changes now.
>> As I see, it might be necessary if the parallel executor needs to
>> reinitialize the shared memory state before launching a fresh batches of
>> workers (it is based on
>> the comment of the ExecParallelReinitialize function), and when it
>> happens all child nodes reset their state (see the comment next to the
>> call to the ExecParallelReInitializeDSM
>> function).
> I did not move/remove the parallel_aware check in
> ExecParallelReInitializeDSM because it doesn't have the same
> requirements -- we *don't* need that flexibility there, because it
> isn't necessary (or correct) to reinitialize anything when the only
> thing that's in DSM is instrumentation state. (Though I did add an
> assertion about parallel_aware-ness to functions like
> ExecIndexScanReInitializeDSM, which I thought might make this a little
> clearer to people reading files like nodeIndexScan.c, and wondering
> why ExecIndexScanReInitializeDSM doesn't specifically test
> parallel_aware.)
>
> Obviously, these node types don't have their state reset (quoting
> ExecParallelReInitializeDSM switch statement here):
>
>          case T_BitmapIndexScanState:
>          case T_HashState:
>          case T_SortState:
>          case T_IncrementalSortState:
>          case T_MemoizeState:
>              /* these nodes have DSM state, but no reinitialization is
> required */
>              break;
>
> I added T_BitmapIndexScanState to the top of this list -- the rest are
> from before today's commit. I did this since (like the other nodes
> shown) BitmapIndexScan's use of DSM is limited to instrumentation
> state -- which we never want to reset (there is no such thing as a
> parallel bitmap index scan, though bitmap index scans can run in
> parallel workers, and still need the instrumentation to work with
> EXPLAIN ANALYZE).
>
> We still need to call functions like ExecIndexOnlyScanReInitializeDSM
> from here/ExecParallelReInitializeDSM, of course, but that won't reset
> the new instrumentation state (because my patch didn't touch it at
> all, except for adding that assertion I already mentioned in passing).
>
> We actually specifically rely on *not* resetting the shared memory
> state to get correct behavior in cases like this one:
>
> https://www.postgresql.org/message-id/CAAKRu_YjBPfGp85ehY1t9NN%3DR9pB9k%3D6rztaeVkAm-OeTqUK4g%40mail.gmail.com
>
> See comments about this in places like ExecEndBitmapIndexScan (added
> by today's commit), or in ExecEndBitmapHeapScan (added by the similar
> bitmap heap instrumentation patch discussed on that other thread,
> which became commit 5a1e6df3).
>
>
Thank you for the explanation!

Now I see why these changes were made.

After your additional explanations, everything really became clear and I 
fully agree with the current code regarding this part.
However I did not see an explanation to the commit regarding this place, 
as well as a comment next to the assert and the parallel_aware check and 
why BitmapIndexScanState was added in the ExecParallelReInitializeDSM.
In my opinion, there is not enough additional explanation about this in 
the form of comments, although I think that it has already been 
explained here enough for someone who will look at this code.

-- 
Regards,
Alena Rybakina
Postgres Professional




pgsql-hackers by date:

Previous
From: Nathan Bossart
Date:
Subject: Re: remove open-coded popcount in acl.c
Next
From: Alexander Borisov
Date:
Subject: Re: Optimization for lower(), upper(), casefold() functions.