Re: [HACKERS] [PATCH] Incremental sort - Mailing list pgsql-hackers

From Alexander Korotkov
Subject Re: [HACKERS] [PATCH] Incremental sort
Date
Msg-id CAPpHfdt14t5fsLQ2Pd5m2WQcRZHaHHPNyx9Fko6x_4Uk-Buq4g@mail.gmail.com
Whole thread Raw
In response to Re: [HACKERS] [PATCH] Incremental sort  (Tomas Vondra <tomas.vondra@2ndquadrant.com>)
Responses Re: [HACKERS] [PATCH] Incremental sort
List pgsql-hackers
On Sat, Apr 7, 2018 at 11:57 PM, Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote:
On 04/07/2018 06:23 PM, Tom Lane wrote:
> Teodor Sigaev <teodor@sigaev.ru> writes:
>>> I dunno, how would you estimate whether this is actually a win or not?
>>> I don't think our model of sort costs is anywhere near refined enough
>>> or accurate enough to reliably predict whether this is better than
>>> just doing it in one step.  Even if the cost model is good, it's not
>>> going to be better than our statistics about the number/size of the
>>> groups in the first column(s), and that's a notoriously unreliable stat.
>
>> I think that improvement in cost calculation of sort should be a
>> separate patch, not directly connected to this one. Postpone patches
>> till other part will be ready to get max improvement for postponed ones
>> doesn't seem to me very good, especially if it suggests some improvement
>> right now.
>
> No, you misunderstand the point of my argument.  Without a reasonably
> reliable cost model, this patch could easily make performance *worse*
> not better for many people, due to choosing incremental-sort plans
> where they were really a loss.
>

Yeah. Essentially the patch could push the planner to pick a path that
has low startup cost (and very high total cost), assuming it'll only
need to read small part of the input. But if the number of groups in the
input is low (perhaps just one huge group), that would be a regression.
 
Yes, I think the biggest risk here is too small number of groups.  More
precisely the risk is too large groups while total number of groups might
be large enough.

> If we were at the start of a development cycle and work were being
> promised to be done later in the cycle to improve the planning aspect,
> I'd be more charitable about it.  But this isn't merely the end of a
> cycle, it's the *last day*.  Now is not the time to commit stuff that
> needs, or even just might need, follow-on work.
>

+1 to that

FWIW I'm willing to spend some time on the patch for PG12, particularly
on the planner / costing part. The potential gains are too interesting

Thank you very much for your efforts on reviewing this patch.
I'm looking forward work with you on this feature for PG12.

FWIW, I think that we're moving this patch in the right direction by
providing separate paths for incremental sort.  It's much better than
deciding between full or incremental sort in-place.  For sure, considereing
extra paths might cause planning time regression.  But I think the
same statement is true about many other planning optimizations.
One thing be can do is to make enable_incrementalsort = off by
default.  Then only users who understand improtance of incremental
sort will get extra planning time.

------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company
 

pgsql-hackers by date:

Previous
From: Greg Stark
Date:
Subject: Re: PostgreSQL's handling of fsync() errors is unsafe and risks data loss at least on XFS
Next
From: Robert Haas
Date:
Subject: Re: [HACKERS] Runtime Partition Pruning