Home > mailing lists

Re: PoC: Partial sort - Mailing list pgsql-hackers

From	Robert Haas
Subject	Re: PoC: Partial sort
Date	December 1, 2016 20:05:42
Msg-id	CA+TgmoZapyHRm7NVyuyZ+yAV=U1a070BOgRe7PkgyrAegR4JDA@mail.gmail.com Whole thread Raw
In response to	Re: PoC: Partial sort (Alexander Korotkov <aekorotkov@gmail.com>)
Responses	Re: PoC: Partial sort (Haribabu Kommi <kommi.haribabu@gmail.com>)
List	pgsql-hackers

Tree view

On Tue, Sep 13, 2016 at 4:32 AM, Alexander Korotkov
<aekorotkov@gmail.com> wrote:
> On Fri, Apr 8, 2016 at 10:09 PM, Peter Geoghegan <pg@heroku.com> wrote:
>> On Wed, Mar 30, 2016 at 8:02 AM, Alexander Korotkov
>> <aekorotkov@gmail.com> wrote:
>> > Hmm... I'm not completely agree with that. In typical usage partial sort
>> > should definitely use quicksort.  However, fallback to other sort
>> > methods is
>> > very useful.  Decision of partial sort usage is made by planner.  But
>> > planner makes mistakes.  For example, our HashAggregate is purely
>> > in-memory.
>> > In the case of planner mistake it causes OOM.  I met such situation in
>> > production and not once.  This is why I'd like partial sort to have
>> > graceful
>> > degradation for such cases.
>>
>> I think that this should be moved to the next CF, unless a committer
>> wants to pick it up today.
>
> Patch was rebased to current master.

Just a few quick observations on this...

It strikes me that the API contract change in ExecMaterializesOutput
is pretty undesirable.  I think it would be better to have a new
executor node for this node rather than overloading the existing
"Sort" node, sharing code where possible of course.  The fact that
this would distinguish them more clearly in an EXPLAIN plan seems
good, too.  "Partial Sort" is the obvious thing, but there might be
even better alternatives -- maybe "Incremental Sort" or something like
that?  Because it's not partially sorting the data, it's making data
that already has some sort order have a more rigorous sort order.

I think that it will probably be pretty common to have queries where
the data is sorted by (mostly_unique_col) and we want to get it sorted
by (mostly_unique_col, disambiguation_col).  In such cases I wonder if
we'll incur a lot of overhead by feeding single tuples to the
tuplesort stuff and performing lots of 1-item sorts.  Not sure if that
case needs any special optimization.

I also think that the "HeapTuple prev" bit in SortState is probably
not the right way of doing things.  I think that should use an
additional TupleTableSlot rather than a HeapTuple.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

pgsql-hackers by date:

From: Alvaro Herrera
Date: 01 December 2016, 19:49:36
Subject: Re: Re: [COMMITTERS] pgsql: Build HTML documentation using XSLT stylesheets by default

From: Robert Haas
Date: 01 December 2016, 21:21:52
Subject: Re: s/xlog/wal/ in tools and function names?

Re: PoC: Partial sort - Mailing list pgsql-hackers

Previous

Next