Re: Performance issues with parallelism and LIMIT - Mailing list pgsql-hackers

From Tomas Vondra
Subject Re: Performance issues with parallelism and LIMIT
Date
Msg-id c3ab3c44-5fb3-4f9c-e5ab-d9af536927a7@enterprisedb.com
Whole thread Raw
In response to Re: Performance issues with parallelism and LIMIT  (David Geier <geidav.pg@gmail.com>)
Responses Re: Performance issues with parallelism and LIMIT
List pgsql-hackers
On 2/20/23 19:18, David Geier wrote:
> Hi,
> 
> On 2/8/23 11:42, Tomas Vondra wrote:
>> On 2/1/23 14:41, David Geier wrote:
>>
>> Yeah, this is a pretty annoying regression. We already can hit poor
>> behavior when matching rows are not distributed uniformly in the tables
>> (which is what LIMIT costing assumes), and this makes it more likely to
>> hit similar issues. A bit like when doing many HTTP requests makes it
>> more likely to hit at least one 99% outlier.
> Are you talking about the use of ordering vs filtering indexes in
> queries where there's both an ORDER BY and a filter present (e.g. using
> an ordering index but then all rows passing the filter are at the end of
> the table)? If not, can you elaborate a bit more on that and maybe give
> an example.

Yeah, roughly. I don't think the explicit ORDER BY is a requirement for
this to happen - it's enough when the part of the plan below LIMIT
produces many rows, but the matching rows are at the end.

>> No opinion on these options, but worth a try. Alternatively, we could
>> try the usual doubling approach - start with a low threshold (and set
>> the latch frequently), and then gradually increase it up to the 1/4.
>>
>> That should work both for queries expecting only few rows and those
>> producing a lot of data.
> 
> I was thinking about this variant as well. One more alternative would be
> latching the leader once a worker has produced 1/Nth of the LIMIT where
> N is the number of workers. Both variants have the disadvantage that
> there are still corner cases where the latch is set too late; but it
> would for sure be much better than what we have today.
> 
> I also did some profiling and - at least on my development laptop with 8
> physical cores - the original example, motivating the batching change is
> slower than when it's disabled by commenting out:
> 
>     if (force_flush || mqh->mqh_send_pending > (mq->mq_ring_size >> 2))
> 
> SET parallel_tuple_cost TO 0;
> CREATE TABLE b (a int);
> INSERT INTO b SELECT generate_series(1, 200000000);
> ANALYZE b;
> EXPLAIN (ANALYZE, TIMING OFF) SELECT * FROM b;
> 
>  Gather  (cost=1000.00..1200284.61 rows=200375424 width=4) (actual
> rows=200000000 loops=1)
>    Workers Planned: 7
>    Workers Launched: 7
>    ->  Parallel Seq Scan on b  (cost=0.00..1199284.61 rows=28625061
> width=4) (actual rows=25000000 loops=8)
> 
> Always latch: 19055 ms
> Batching:     19575 ms
> 
> If I find some time, I'll play around a bit more and maybe propose a patch.
> 

OK. Once you have a WIP patch maybe share it and I'll try to do some
profiling too.

>>> ...
>>>
>>> We would need something similar to CHECK_FOR_INTERRUPTS() which returns
>>> a NULL slot if a parallel worker is supposed to stop execution (we could
>>> e.g. check if the queue got detached). Or could we amend
>>> CHECK_FOR_INTERRUPTS() to just stop the worker gracefully if the queue
>>> got detached?
>>>
>> That sounds reasonable, but I'm not very familiar the leader-worker
>> communication, so no opinion on how it should be done.
> 
> I think an extra macro that needs to be called from dozens of places to
> check if parallel execution is supposed to end is the least preferred
> approach. I'll read up more on how CHECK_FOR_INTERRUPTS() works and if
> we cannot actively signal the workers that they should stop.
> 

IMHO if this requires adding another macro to a bunch of ad hoc places
is rather inconvenient. It'd be much better to fix this in a localized
manner (especially as it seems related to a fairly specific place).

regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



pgsql-hackers by date:

Previous
From: Tomas Vondra
Date:
Subject: Re: Ignoring BRIN for HOT updates (was: -udpates seems broken)
Next
From: Heikki Linnakangas
Date:
Subject: Re: Experiments with Postgres and SSL