Home > mailing lists

Re: PoC: Partial sort - Mailing list pgsql-hackers

From	David Rowley
Subject	Re: PoC: Partial sort
Date	December 31, 2013 08:18:47
Msg-id	CAApHDvodCHCj9=W8k5huEs6WwxBSbRQq63pwto--bcK+RmcK4g@mail.gmail.com Whole thread
In response to	Re: PoC: Partial sort (Andreas Karlsson <andreas@proxel.se>)
List	pgsql-hackers

Tree view

On Tue, Dec 31, 2013 at 2:41 PM, Andreas Karlsson <andreas@proxel.se> wrote:

On 12/29/2013 08:24 AM, David Rowley wrote:
If it was possible to devise some way to reuse any
previous tuplesortstate perhaps just inventing a reset method which
clears out tuples, then we could see performance exceed the standard
seqscan -> sort. The code the way it is seems to lookup the sort
functions from the syscache for each group then allocate some sort
space, so quite a bit of time is also spent in palloc0() and pfree()

If it was not possible to do this then maybe adding a cost to the number
of sort groups would be better so that the optimization is skipped if
there are too many sort groups.

It should be possible. I have hacked a quick proof of concept for reusing the tuplesort state. Can you try it and see if the performance regression is fixed by this?

One thing which have to be fixed with my patch is that we probably want to close the tuplesort once we have returned the last tuple from ExecSort().

I have attached my patch and the incremental patch on Alexander's patch.

Thanks, the attached is about 5 times faster than it was previously with my test case upthread.

The times now look like:

No pre-sortable index:

Total runtime: 86.278 ms

With pre-sortable index with partial sorting

Total runtime: 47.500 ms

With the query where there is no index the sort remained in memory.

I spent some time trying to find a case where the partial sort is slower than the seqscan -> sort. The only places partial sort seems slower are when the number of estimated sort groups are around the crossover point where the planner would be starting to think about performing a seqscan -> sort instead. I'm thinking right now that it's not worth raising the costs around this as the partial sort is less likely to become a disk sort than the full sort is.

I'll keep going with trying to break it.

Regards

David Rowley

--
Andreas Karlsson

pgsql-hackers by date:

From: Peter Geoghegan
Date: 31 December 2013, 07:18:30
Subject: Re: INSERT...ON DUPLICATE KEY LOCK FOR UPDATE

From: Simon Riggs
Date: 31 December 2013, 08:48:59
Subject: Re: Patch: Show process IDs of processes holding a lock; show relation and tuple infos of a lock to acquire

Re: PoC: Partial sort - Mailing list pgsql-hackers

Previous

Next