Re: Using quicksort for every external sort run - Mailing list pgsql-hackers

From David Fetter
Subject Re: Using quicksort for every external sort run
Date
Msg-id 20151130040259.GA22300@fetter.org
Whole thread Raw
In response to Re: Using quicksort for every external sort run  (Jeff Janes <jeff.janes@gmail.com>)
Responses Re: Using quicksort for every external sort run  (Jeff Janes <jeff.janes@gmail.com>)
List pgsql-hackers
On Sat, Nov 28, 2015 at 02:04:16PM -0800, Jeff Janes wrote:
> On Wed, Nov 18, 2015 at 3:29 PM, Peter Geoghegan <pg@heroku.com> wrote:
> > On Wed, Nov 18, 2015 at 10:31 AM, Jeff Janes <jeff.janes@gmail.com> wrote:
> >
> >> I agree we don't want to optimize for low memory, but I don't think we
> >> should throw it under the bus, either.  Right now we are effectively
> >> saying the CPU-cache problems with the heap start exceeding the larger
> >> run size benefits at 64kb (the smallest allowed setting for work_mem).
> >> While any number we pick is going to be a guess that won't apply to
> >> all hardware, surely we can come up with a guess better than 64kb.
> >> Like, 8 MB, say.  If available memory for the sort is 8MB or smaller
> >> and the predicted size anticipates a multipass merge, then we can use
> >> the heap method rather than the quicksort method.  Would a rule like
> >> that complicate things much?
> >
> > I'm already using replacement selection for the first run when it is
> > predicted by my new ad-hoc cost model that we can get away with a
> > "quicksort with spillover", avoiding almost all I/O. We only
> > incrementally spill as many tuples as needed right now, but it would
> > be pretty easy to not quicksort the remaining tuples, but continue to
> > incrementally spill everything. So no, it wouldn't be too hard to hang
> > on to the old behavior sometimes, if it looked worthwhile.
> >
> > In principle, I have no problem with doing that. Through testing, I
> > cannot see any actual upside, though. Perhaps I just missed something.
> > Even 8MB is enough to avoid the multipass merge in the event of a
> > surprisingly high volume of data (my work laptop is elsewhere, so I
> > don't have my notes on this in front of me, but I figured out the
> > crossover point for a couple of cases).
> 
> For me very large sorts (100,000,000 ints) with work_mem below 4MB do
> better with unpatched than with your patch series, by about 5%.  Not a
> big deal, but also if it is easy to keep the old behavior then I think
> we should.  Yes, it is dumb to do large sorts with work_mem below 4MB,
> but if you have canned apps which do a mixture of workloads it is not
> so easy to micromanage their work_mem.  Especially as there are no
> easy tools that let me as the DBA say "if you connect from this IP
> address, you get this work_mem".

That's certainly doable with pgbouncer, for example.  What would you
have in mind for the more general capability?  It seems to me that
bloating up pg_hba.conf would be undesirable, but maybe I'm picturing
this as bigger than it actually needs to be.

Cheers,
David.
-- 
David Fetter <david@fetter.org> http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter      XMPP: david.fetter@gmail.com

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate



pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: Issue on C function that reads int2[] (using "int2vector")
Next
From: David Rowley
Date:
Subject: Re: Memory prefetching while sequentially fetching from SortTuple array, tuplestore