Re: Performance issues with parallelism and LIMIT - Mailing list pgsql-hackers

From Tomas Vondra
Subject Re: Performance issues with parallelism and LIMIT
Date
Msg-id 31811138-6c2e-4035-bcc6-8e873ba05eea@vondra.me
Whole thread Raw
In response to Re: Performance issues with parallelism and LIMIT  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: Performance issues with parallelism and LIMIT
List pgsql-hackers
On 11/18/25 17:51, Tom Lane wrote:
> David Geier <geidav.pg@gmail.com> writes:
>> On 18.11.2025 16:40, Tomas Vondra wrote:
>>> It'd need code in the parallel-aware scans, i.e. seqscan, bitmap, index.
>>> I don't think you'd need code in other plans, because all parallel plans
>>> have one "driving" table.
> 
>> A sort node for example makes this no longer work. As soon as the sort
>> node pulled all rows from its driving table, the sort node becomes the
>> driving table for its parent nodes. If no more tables are involved in
>> the plan from that point on, early termination no longer works.
> 
> You're assuming that the planner will insert Gather nodes at arbitrary
> places in the plan, which isn't true.  If it does generate plans that
> are problematic from this standpoint, maybe the answer is "don't
> parallelize in exactly that way".
> 

I think David has a point that nodes that "buffer" tuples (like Sort or
HashAgg) would break the approach making this the responsibility of the
parallel-aware scan. I don't see anything particularly wrong with such
plans - plans with partial aggregation often look like that.

Maybe this should be the responsibility of execProcnode.c, not the
various nodes?

It'd be nice to show this in EXPLAIN (that some of the workers were
terminated early, before processing all the data).


regards

-- 
Tomas Vondra




pgsql-hackers by date:

Previous
From: Peter Eisentraut
Date:
Subject: Re: GUC thread-safety approaches
Next
From: Bryan Green
Date:
Subject: Re: [PATCH] Allow complex data for GUC extra.