Thread: Re: RFC/PoC: GUC option to enable tuple queue autoflush for parallel workers

Re: RFC/PoC: GUC option to enable tuple queue autoflush for parallel workers

From
Francesco Degrassi
Date:
Hello, I hope bumping up this is not frowned upon.
Any chance we can get any feedback?

Thanks and best regards

Francesco

On Thu, 26 Sept 2024 at 16:15, Francesco Degrassi
<francesco.degrassi@optionfactory.net> wrote:
>
> Hi all. A brief overview of our use case follows.
>
> We are developing a foreign data wrapper which employs parallel scan
> support and predicate pushdown; given the types of queries we run,
> foreign scans can be very long and often return very few rows.
>
> As the scan can be very long and slow, we'd like to provide partial
> results to the user as rows are being returned. We found two problems
> with that:
> 1. Leader backend would not poll the parallel workers queue until it
> itself found a row to return; we worked around it by turning
> `parallel_leader_participation` to off.
> 2. Parallel workers tuple queues have buffering, and are not flushed
> until a certain fill threshold is reached; as our queries yield few
> result rows, oftentimes these rows would only get returned at the end
> of the (very long) scan.
>
> The proposal is to add a `parallel_tuplequeue_autoflush` GUC (bool,
> default false) that would force every row returned by a parallel
> worker to be immediately flushed to the leader; this was already the
> case before v15, so it simply allows to opt for the previous
> behaviour.
>
> This would be achieved by configuring a `auto_flush` field on
> `TQueueDestReceiver`, so that `tqueueReceiveSlot` would pass
> `force_flush` when calling `shm_mq_send`.
>
> The attached patch, tested on master @ 1ab67c9dfaadda , is a poc
> tentative implementation.
> Based on feedback, we're available to work on a complete and properly
> documented patch.
>
> Thanks in advance for your consideration.
>
> Regards,
> Francesco