Thread: Releasing memory during External sorting?

Releasing memory during External sorting?

From

Simon Riggs

Date:

23 September 2005, 06:37:21

I have concerns about whether we are overallocating memory for use in
external sorts. (All code relating to this is in tuplesort.c)

When we begin a sort we allocate (work_mem | maintenance_work_mem) and
attempt to do the sort in memory. If the sort set is too big to fit in
memory we then write to disk and begin an external sort. The same memory
allocation is used for both types of sort, AFAICS.

The external sort algorithm benefits from some memory but not much.
Knuth says that the amount of memory required is very low, with a value
typically less than 1 kB. I/O overheads mean that there is benefit from
having longer sequential writes, so the optimum is much larger than
that. I've not seen any data that indicates that a setting higher than
16 MB adds any value at all to a large external sort. I have some
indications from private tests that very high memory settings may
actually hinder performance of the sorts, though I cannot explain that
and wonder whether it is the performance tests themselves that have
issues.

Does anyone have any clear data that shows the value of large settings
of work_mem when the data to be sorted is much larger than memory? (I am
well aware of the value of setting work_mem higher for smaller sorts, so
any performance data needs to reflect only very large sorts).

If not, I would propose that when we move from qsort to tapesort mode we
free the larger work_mem setting (if one exists) and allocate only a
lower, though still optimal setting for the tapesort. That way the
memory can be freed for use by other users or the OS while the tapesort
proceeds (which is usually quite a while...).

Feedback, please.

Best Regards, Simon Riggs

Re: [HACKERS] Releasing memory during External sorting?

From

Tom Lane

Date:

23 September 2005, 11:09:46

Simon Riggs <simon@2ndquadrant.com> writes:
> If not, I would propose that when we move from qsort to tapesort mode we
> free the larger work_mem setting (if one exists) and allocate only a
> lower, though still optimal setting for the tapesort. That way the
> memory can be freed for use by other users or the OS while the tapesort
> proceeds (which is usually quite a while...).

On most platforms it's quite unlikely that any memory would actually get
released back to the OS before transaction end, because the memory
blocks belonging to the tuplesort context will be intermixed with blocks
belonging to other contexts.  So I think this is pretty pointless.
(If you can't afford to have the sort using all of sort_mem, you've set
sort_mem too large, anyway.)

            regards, tom lane

Re: [HACKERS] Releasing memory during External sorting?

From

Simon Riggs

Date:

23 September 2005, 12:07:31

On Fri, 2005-09-23 at 10:09 -0400, Tom Lane wrote:
> Simon Riggs <simon@2ndquadrant.com> writes:
> > If not, I would propose that when we move from qsort to tapesort mode we
> > free the larger work_mem setting (if one exists) and allocate only a
> > lower, though still optimal setting for the tapesort. That way the
> > memory can be freed for use by other users or the OS while the tapesort
> > proceeds (which is usually quite a while...).
>
> On most platforms it's quite unlikely that any memory would actually get
> released back to the OS before transaction end, because the memory
> blocks belonging to the tuplesort context will be intermixed with blocks
> belonging to other contexts.  So I think this is pretty pointless.

I take it you mean pointless because of the way the memory allocation
works, rather than because giving memory back isn't worthwhile ?

Surely the sort memory would be allocated in contiguous chunks? In some
cases we might be talking about more than a GB of memory, so it'd be
good to get that back ASAP. I'm speculating....

> (If you can't afford to have the sort using all of sort_mem, you've set
> sort_mem too large, anyway.)

Sort takes care to allocate only what it needs as starts up. All I'm
suggesting is to take the same care when the sort mode changes. If the
above argument held water then we would just allocate all the memory in
one lump at startup, "because we can afford to", so I don't buy that.

Since we know the predicted size of the sort set prior to starting the
sort node, could we not use that information to allocate memory
appropriately? i.e. if sort size is predicted to be more than twice the
size of work_mem, then just move straight to the external sort algorithm
and set the work_mem down at the lower limit?

That is, unless somebody has evidence that having a very large memory
has any performance benefit for external sorting?

Best Regards, Simon Riggs

Re: [HACKERS] Releasing memory during External sorting?

From

Tom Lane

Date:

23 September 2005, 12:39:17

Simon Riggs <simon@2ndquadrant.com> writes:
> Since we know the predicted size of the sort set prior to starting the
> sort node, could we not use that information to allocate memory
> appropriately? i.e. if sort size is predicted to be more than twice the
> size of work_mem, then just move straight to the external sort algorithm
> and set the work_mem down at the lower limit?

Have you actually read the sort code?

During the run-forming phase it's definitely useful to eat all the
memory you can: that translates directly to longer initial runs and
hence fewer merge passes.  During the run-merging phase it's possible
that using less memory would not hurt performance any, but as already
stated, I don't think it will actually end up cutting the backend's
memory footprint --- the sbrk point will be established during the run
forming phase and it's unlikely to move back much until transaction end.

Also, if I recall the development of that code correctly, the reason for
using more than minimum memory during the merge phase is that writing or
reading lots of tuples at once improves sequentiality of access to the
temp files.  So I'm not sure that cutting down the memory wouldn't hurt
performance.

            regards, tom lane

Re: Releasing memory during External sorting?

From

Ron Peacetree

Date:

23 September 2005, 13:48:43

From: Simon Riggs <simon@2ndquadrant.com>
Sent: Sep 23, 2005 5:37 AM
Subject: [PERFORM] Releasing memory during External sorting?

>I have concerns about whether we are overallocating memory for use in
>external sorts. (All code relating to this is in tuplesort.c)
>
A decent external sorting algorithm, say a Merge Sort + Radix (or
Distribution Counting) hybrid with appropriate optimizations for small sub-
files, should become more effective / efficient the more RAM you give it.


>The external sort algorithm benefits from some memory but not much.
>
That's probably an artifact of the psql external sorting code and _not_
due to some fundamental external sorting issue.


>Knuth says that the amount of memory required is very low, with a value
>typically less than 1 kB.
>
"Required" means the external sort can operate on that little memory.  How
Much memory is required for optimal performance is another matter.


>I/O overheads mean that there is benefit from having longer sequential
>writes, so the optimum is much larger than that. I've not seen any data
>that indicates that a setting higher than 16 MB adds any value at all to a
>large external sort.
>
It should.  A first pass upper bound would be the amount of RAM needed for
Replacement Selection to create a run (ie sort) of the whole file.  That should
be ~ the amount of RAM to hold 1/2 the file in a Replacement Selection pass.

At the simplest, for any file over 32MB the optimum should be more than
16MB.


> I have some indications from private tests that very high memory settings
>may actually hinder performance of the sorts, though I cannot explain that
>and wonder whether it is the performance tests themselves that have issues.
>
Hmmm.  Are you talking about amounts so high that you are throwing the OS
into paging and swapping thrash behavior?  If not, then the above is weird.


>Does anyone have any clear data that shows the value of large settings
>of work_mem when the data to be sorted is much larger than memory? (I am
>well aware of the value of setting work_mem higher for smaller sorts, so
>any performance data needs to reflect only very large sorts).
>
This is not PostgreSQL specific, but it does prove the point that the performance
of external sorts benefits greatly from large amounts of RAM being available:

http://research.microsoft.com/barc/SortBenchmark/

Looking at the particulars of the algorithms listed there should shed a lot of light
on what a "good" external sorting algorithm looks like:
1= HD IO matters the most.
     1a= Seeking behavior is the largest factor in poor performance.
2= No optimal external sorting algorithm should use more than 2 passes.
3= Optimal external sorting algorithms should use 1 pass if at all possible.
4= Use as much RAM as possible, and use it as efficiently as possible.
5= The amount of RAM needed to hide the latency of a HD subsytem goes up as
the _square_ of the difference between the bandwidth of the HD subsystem and
memory.
6= Be cache friendly.
7= For large numbers of records whose sorting key is substantially smaller than
the record itself, use a pointer + compressed key representation and write the data
to HD in sorted order (Replace HD seeks with RAM seeks.  Minimize RAM seeks).
8= Since your performance will be constrained by HD IO first and RAM IO second,
up to a point it is worth it to spend more CPU cycles to save on IO.

Given the large and growing gap between CPU IO, RAM IO, and HD IO, these issues
are becoming more important for _internal_ sorts as well.


>Feedback, please.
>
>Best Regards, Simon Riggs
>
Hope this is useful,
Ron

Re: Releasing memory during External sorting?

From

Tom Lane

Date:

23 September 2005, 14:17:33

Ron Peacetree <rjpeace@earthlink.net> writes:
> 2= No optimal external sorting algorithm should use more than 2 passes.
> 3= Optimal external sorting algorithms should use 1 pass if at all possible.

A comparison-based sort must use at least N log N operations, so it
would appear to me that if you haven't got approximately log N passes
then your algorithm doesn't work.

            regards, tom lane

Re: Releasing memory during External sorting?

From

Mark Lewis

Date:

23 September 2005, 14:46:17

operations != passes.  If you were clever, you could probably write a
modified bubble-sort algorithm that only made 2 passes.  A pass is a
disk scan, operations are then performed (hopefully in memory) on what
you read from the disk.  So there's no theoretical log N lower-bound on
the number of disk passes.

Not that I have anything else useful to add to this discussion, just a
tidbit I remembered from my CS classes back in college :)

-- Mark

On Fri, 2005-09-23 at 13:17 -0400, Tom Lane wrote:
> Ron Peacetree <rjpeace@earthlink.net> writes:
> > 2= No optimal external sorting algorithm should use more than 2 passes.
> > 3= Optimal external sorting algorithms should use 1 pass if at all possible.
>
> A comparison-based sort must use at least N log N operations, so it
> would appear to me that if you haven't got approximately log N passes
> then your algorithm doesn't work.
>
>             regards, tom lane
>
> ---------------------------(end of broadcast)---------------------------
> TIP 9: In versions below 8.0, the planner will ignore your desire to
>        choose an index scan if your joining column's datatypes do not
>        match

Re: Releasing memory during External sorting?

From

Tom Lane

Date:

23 September 2005, 15:16:07

Mark Lewis <mark.lewis@mir3.com> writes:
> operations != passes.  If you were clever, you could probably write a
> modified bubble-sort algorithm that only made 2 passes.  A pass is a
> disk scan, operations are then performed (hopefully in memory) on what
> you read from the disk.  So there's no theoretical log N lower-bound on
> the number of disk passes.

Given infinite memory that might be true, but I don't think I believe it
for limited memory.  If you have room for K tuples in memory then it's
impossible to perform more than K*N useful comparisons per pass (ie, as
each tuple comes off the disk you can compare it to all the ones
currently in memory; anything more is certainly redundant work).  So if
K < logN it's clearly not gonna work.

It's possible that you could design an algorithm that works in a fixed
number of passes if you are allowed to assume you can hold O(log N)
tuples in memory --- and in practice that would probably work fine,
if the constant factor implied by the O() isn't too big.  But it's not
really solving the general external-sort problem.

            regards, tom lane

Re: Releasing memory during External sorting?

From

Ron Peacetree

Date:

23 September 2005, 15:40:53

Yep.  Also, bear in mind that the lg(n!)= ~ nlgn - n lower bound on
the number of comparisions:
a= says nothing about the amount of data movement used.
b= only holds for generic comparison based sorting algorithms.

As Knuth says (vol 3, p180), Distribution Counting sorts without
ever comparing elements to each other at all, and so does Radix
Sort.  Similar comments can be found in many algorithms texts.

Any time we know that the range of the data to be sorted is substantially
restricted compared to the number of items to be sorted, we can sort in
less than O(lg(n!)) time.  DB fields tend to take on few values and are
therefore "substantially restricted".

Given the proper resources and algorithms, O(n) sorts are very plausible
when sorting DB records.

All of the fastest external sorts of the last decade or so take advantage of
this.  Check out that URL I posted.

Ron

-----Original Message-----
From: Mark Lewis <mark.lewis@mir3.com>
Sent: Sep 23, 2005 1:43 PM
To: Tom Lane <tgl@sss.pgh.pa.us>
Subject: Re: [PERFORM] Releasing memory during External sorting?

operations != passes.  If you were clever, you could probably write a
modified bubble-sort algorithm that only made 2 passes.  A pass is a
disk scan, operations are then performed (hopefully in memory) on what
you read from the disk.  So there's no theoretical log N lower-bound on
the number of disk passes.

Not that I have anything else useful to add to this discussion, just a
tidbit I remembered from my CS classes back in college :)

-- Mark

On Fri, 2005-09-23 at 13:17 -0400, Tom Lane wrote:
> Ron Peacetree <rjpeace@earthlink.net> writes:
> > 2= No optimal external sorting algorithm should use more than 2 passes.
> > 3= Optimal external sorting algorithms should use 1 pass if at all possible.
>
> A comparison-based sort must use at least N log N operations, so it
> would appear to me that if you haven't got approximately log N passes
> then your algorithm doesn't work.
>
>             regards, tom lane

Re: Releasing memory during External sorting?

From

Ron Peacetree

Date:

23 September 2005, 16:45:05

From: Tom Lane <tgl@sss.pgh.pa.us>
Sent: Sep 23, 2005 2:15 PM
Subject: Re: [PERFORM] Releasing memory during External sorting?

>Mark Lewis <mark.lewis@mir3.com> writes:
>> operations != passes.  If you were clever, you could probably write a
>> modified bubble-sort algorithm that only made 2 passes.  A pass is a
>> disk scan, operations are then performed (hopefully in memory) on what
>> you read from the disk.  So there's no theoretical log N lower-bound on
>> the number of disk passes.

>Given infinite memory that might be true, but I don't think I believe it
>for limited memory.  If you have room for K tuples in memory then it's
>impossible to perform more than K*N useful comparisons per pass (ie, as
>each tuple comes off the disk you can compare it to all the ones
>currently in memory; anything more is certainly redundant work).  So if
>K < logN it's clearly not gonna work.
>
Actually, it's far better than that.  I recall a paper I saw in one of the
algorithms journals 15+ years ago that proved that if you knew the range
of the data, regardless of what that range was, and had n^2 space, you
could sort n items in O(n) time.

Turns out that with very modest constraints on the range of the data and
substantially less extra space (about the same as you'd need for
Replacement Selection + External Merge Sort), you can _still_ sort in
O(n) time.


>It's possible that you could design an algorithm that works in a fixed
>number of passes if you are allowed to assume you can hold O(log N)
>tuples in memory --- and in practice that would probably work fine,
>if the constant factor implied by the O() isn't too big.  But it's not
>really solving the general external-sort problem.
>
If you know nothing about the data to be sorted and must guard against
the worst possible edge cases, AKA the classic definition of "the general
external sorting problem",  then one can't do better than some variant
of Replacement Selection + Unbalanced Multiway Merge.

OTOH, ITRW things are _not_ like that.  We know the range of the data
in our DB fields or we can safely assume it to be relatively constrained.
This allows us access to much better external sorting algorithms.

For example Postman Sort (the 2005 winner of the PennySort benchmark)
is basically an IO optimized version of an external Radix Sort.


Ron

Re: [HACKERS] Releasing memory during External sorting?

From

Simon Riggs

Date:

25 September 2005, 14:45:38

On Fri, 2005-09-23 at 11:31 -0400, Tom Lane wrote:
> Simon Riggs <simon@2ndquadrant.com> writes:
> > Since we know the predicted size of the sort set prior to starting the
> > sort node, could we not use that information to allocate memory
> > appropriately? i.e. if sort size is predicted to be more than twice the
> > size of work_mem, then just move straight to the external sort algorithm
> > and set the work_mem down at the lower limit?
>
> Have you actually read the sort code?

Yes and Knuth too. Your research and code are incredible, almost
untouchable. Yet sort performance is important and empirical evidence
suggests that this can be improved upon significantly, so I am and will
be spending time trying to improve upon that. Another time...

This thread was aiming to plug a problem I saw with 8.1's ability to use
very large work_mem settings. I felt that either my performance numbers
were wrong or we needed to do something; I've not had anybody show me
performance numbers that prove mine doubtful, yet.

> During the run-forming phase it's definitely useful to eat all the
> memory you can: that translates directly to longer initial runs and
> hence fewer merge passes.

Sounds good, but maybe that is not the dominant effect. I'll retest, on
the assumption that there is a benefit, but there's something wrong with
my earlier tests.

> During the run-merging phase it's possible
> that using less memory would not hurt performance any, but as already
> stated, I don't think it will actually end up cutting the backend's
> memory footprint --- the sbrk point will be established during the run
> forming phase and it's unlikely to move back much until transaction end.

> Also, if I recall the development of that code correctly, the reason for
> using more than minimum memory during the merge phase is that writing or
> reading lots of tuples at once improves sequentiality of access to the
> temp files.  So I'm not sure that cutting down the memory wouldn't hurt
> performance.

Cutting memory below about 16 MB does definitely hurt external sort
performance; I explain that as being the effect of sequential access. I
haven't looked to nail down the breakpoint exactly since it seemed more
important simply to say that there looked like there was one.. Its just
that raising it above that mark doesn't help much, according to my
current results.

I'll get some more test results and repost them, next week. I will be
very happy if the results show that more memory helps.

Best Regards, Simon Riggs

Re: Releasing memory during External sorting?

From

Simon Riggs

Date:

25 September 2005, 14:45:45

On Fri, 2005-09-23 at 12:48 -0400, Ron Peacetree wrote:

> > I have some indications from private tests that very high memory settings
> >may actually hinder performance of the sorts, though I cannot explain that
> >and wonder whether it is the performance tests themselves that have issues.
> >
> Hmmm.  Are you talking about amounts so high that you are throwing the OS
> into paging and swapping thrash behavior?  If not, then the above is weird.

Thanks for your thoughts. I'll retest, on the assumption that there is a
benefit, but there's something wrong with my earlier tests.

Best Regards, Simon Riggs