Re: changing sort_mem on the fly? - Mailing list pgsql-general

From Jim C. Nasby
Subject Re: changing sort_mem on the fly?
Date
Msg-id 20050130220936.GQ64304@decibel.org
Whole thread Raw
In response to Re: changing sort_mem on the fly?  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-general
On Sun, Jan 30, 2005 at 04:49:39PM -0500, Tom Lane wrote:
> "Jim C. Nasby" <decibel@decibel.org> writes:
> > Since the planner knows how many rows will
> > be going into the sort and how wide they are, ISTM it should be able to
> > estimate how much memory will be needed.
>
> ... which is different from how much will be available.  See cost_sort():

Ok, I wasn't sure which you were refering to. As I mentioned in the
earlier thread, there would have to be some means of accounting for how
much memory active sorts in the system are using. One possibility is
that a global counter is updated any time a sort allocates or frees
memory. If allocations are done 8k at a time that would probably be too
expensive, but I suspect it wouldn't be too bad with larger allocations
that are done less frequently. And bear in mind that if these changes
prevent even a few sorts an hour from spilling to disk then it's
probably still a net gain.

>  * If the total volume of data to sort is less than work_mem, we will do
>  * an in-memory sort, which requires no I/O and about t*log2(t) tuple
>  * comparisons for t tuples.
>  *
>  * If the total volume exceeds work_mem, we switch to a tape-style merge
>  * algorithm.  There will still be about t*log2(t) tuple comparisons in
>  * total, but we will also need to write and read each tuple once per
>  * merge pass.    We expect about ceil(log6(r)) merge passes where r is the
>  * number of initial runs formed (log6 because tuplesort.c uses six-tape
>  * merging).  Since the average initial run should be about twice work_mem,
>  * we have
>  *        disk traffic = 2 * relsize * ceil(log6(p / (2*work_mem)))
>  *        cpu = comparison_cost * t * log2(t)
>
> The actual cost of a sort is therefore *highly* sensitive to how much
> memory it is allowed to use.  Blowing off any ability to estimate this
> in advance is not going to improve matters...

Doesn't the sort code also have a provision for starting an in-memory
sort, and then realizing that it's going to exceed work_mem and going to
disk at that point?

Yes, without something as simple as work_mem it is harder to determine
how much memory you can use, though I don't think it's impossible.
First, a global variable/counter could be used to keep track of how much
memory all the active sorts estimate they'll need. That would be a
first-order guess. This could also be compared to how much memory is
actually being used, though I suspect that would require globally
tracking how much memory each sort estimates it will need and how much
it's already using. Of course there would also be some kind of policy
limiting sorts from taking over all of sort_mem; possibly a GUC,
possibly something more dynamic. Whatever it is, that gives you an
upper-bounds for how much memory would be used at a given point in time.

BTW, something that just occured to me... with the new cache management
code (ARC and whatever's going to replace it), presumably it will now be
better for PostgreSQL to primarily do caching instead of the OS. This
means you'd want to give PostgreSQL the lion's share of memory, but with
how sorts currently aquire memory doing so would be risky. Should there
be an option in 8.0 for sorts to use shared memory instead of allocating
their own from a (possibly small) pool of memory the OS has?
--
Jim C. Nasby, Database Consultant               decibel@decibel.org
Give your computer some brain candy! www.distributed.net Team #1828

Windows: "Where do you want to go today?"
Linux: "Where do you want to go tomorrow?"
FreeBSD: "Are you guys coming, or what?"

pgsql-general by date:

Previous
From: Chris
Date:
Subject: Re: FreeBSD 5.2.1, postgresql 7.4.5 and shared memory settings
Next
From: elein@varlena.com (elein)
Date:
Subject: example for read committed/volitile functions