Re: Merge algorithms for large numbers of "tapes" - Mailing list pgsql-hackers

From Stephen Frost
Subject Re: Merge algorithms for large numbers of "tapes"
Date
Msg-id 20060309234856.GP4474@ns.snowman.net
Whole thread Raw
In response to Re: Merge algorithms for large numbers of "tapes"  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: Merge algorithms for large numbers of "tapes"
Re: Merge algorithms for large numbers of "tapes"
List pgsql-hackers
* Tom Lane (tgl@sss.pgh.pa.us) wrote:
> "Luke Lonergan" <llonergan@greenplum.com> writes:
> > I would only suggest that we replace the existing algorithm with one that
> > will work regardless of (reasonable) memory requirements.  Perhaps we can
> > agree that at least 1MB of RAM for external sorting will always be available
> > and proceed from there?
>
> If you can sort indefinitely large amounts of data with 1MB work_mem,
> go for it.

It seems you two are talking past each other and I'm at least slightly
confused.  So, I'd like to ask for a bit of clarification and perhaps
that will help everyone.

#1: I'm as much a fan of eliminating unnecessary code as anyone
#2: There have been claims of two-pass improving things 400%
#3: Supposedly two-pass requires on the order of sqrt(total) memory
#4: We have planner statistics to estimate size of total
#5: We have a work_mem limitation for a reason

So, if we get a huge performance increase, what's wrong with:
if [ sqrt(est(total)) <= work_mem ]; then two-pass-sort();
else tape-sort();
fi

?

If the performance isn't much different and tape-sort can do it with
less memory then I don't really see any point in removing it.

If the intent is to remove it and then ask for the default work_mem to
be increased- I doubt going about it this way would work very well. :)
Thanks,
    Stephen

pgsql-hackers by date:

Previous
From: Alvaro Herrera
Date:
Subject: Re: [COMMITTERS] pgsql: Remove Jan Wieck's name from copyrights, and put in standard
Next
From: Stephen Frost
Date:
Subject: Re: Proposal for SYNONYMS