Re: Sorting Improvements for 8.4 - Mailing list pgsql-hackers

From Jeff Davis
Subject Re: Sorting Improvements for 8.4
Date
Msg-id 1196721318.22428.477.camel@dogma.ljc.laika.com
Whole thread Raw
In response to Re: Sorting Improvements for 8.4  (Gregory Stark <stark@enterprisedb.com>)
List pgsql-hackers
On Mon, 2007-12-03 at 20:40 +0000, Gregory Stark wrote:
> So the question is just how many seeks are we doing during sorting. If we're
> doing 0.1% seeks and 99.9% sequential i/o then eliminating the 1% entirely
> (which we can't do) isn't going to speed up seeking all that much. If we're
> doing 20% seeks and can get that down to 10% it might be worthwhile.

It's not just about eliminating seeks, it's about being able to merge
more runs at one time.

If you are merging 10 runs at once, and only two of those runs overlap
and the rest are much greater values, you might be spending 99% of the
time in sequential I/O. 

But the point is, we're wasting the memory holding those other 8 runs in
memory (wasting 80% of the memory you're using), so we really could be
merging a lot more than 10 runs at once. This might eliminate stages
from the merge process.

My point is just that "how many seeks are we doing" is not the only
question. We could be doing 99% sequential I/O and still make huge wins.

In reality, of course, the runs aren't going to be disjoint completely,
but they may be partially disjoint. That's where forecasting comes in:
you preread from the tapes you will actually need tuples from soonest.

Regards,Jeff Davis



pgsql-hackers by date:

Previous
From: Gregory Stark
Date:
Subject: Re: Sorting Improvements for 8.4
Next
From: Devrim GÜNDÜZ
Date:
Subject: Re: Is postgres.gif missing in cvs?