Re: [HACKERS] sort on huge table - Mailing list pgsql-hackers

From Tom Lane
Subject Re: [HACKERS] sort on huge table
Date
Msg-id 2726.941493808@sss.pgh.pa.us
Whole thread Raw
In response to Re: [HACKERS] sort on huge table  (Bruce Momjian <maillist@candle.pha.pa.us>)
List pgsql-hackers
Bruce Momjian <maillist@candle.pha.pa.us> writes:
> If I am correct on the Linux seek thing, and Tatsuo is running Linux, is
> there any way to fake out the kernel on only Linux, so we issue two
> reads in a row before doing a seek?

I dunno.  I see that f_reada is turned off by a seek in the extract you
posted, but I wasn't clear on what turns it on again, nor what happens
after it is turned on.

After further thought I am not sure that read-ahead or lack of it is
the problem.  The changes I committed over the weekend were to try to
improve locality of access to the temp file by reading tuples from
logical tapes in bursts --- in a merge pass that's reading N logical
tapes, it now tries to grab SortMem/N bytes worth of tuples off any one
source tape at a time, rather than just reading an 8K block at a time
from each tape as the first cut did.  That seemed to improve performance
on both my system and Tatsuo's, but his is still far below the speed of
the 6.5 code.  I'm not sure I understand why.  The majority of the block
reads or writes *should* be sequential now, given a reasonable SortMem
(and he tested with quite large settings).  I'm afraid there is some
aspect of the kernel's behavior on his system that we don't have a clue
about...
        regards, tom lane


pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: [HACKERS] file descriptors leak?
Next
From: "nicks.emails"
Date:
Subject: Backend terminated abnormally