Re: [HACKERS] sort on huge table - Mailing list pgsql-hackers

From Bruce Momjian
Subject Re: [HACKERS] sort on huge table
Date
Msg-id 199911011800.NAA20652@candle.pha.pa.us
Whole thread Raw
In response to Re: [HACKERS] sort on huge table  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: [HACKERS] sort on huge table  ("Aaron J. Seigo" <aaron@gtv.ca>)
List pgsql-hackers
> Next question is what to do about it.  I don't suppose we have any way
> of turning off the OS' read-ahead algorithm :-(.  We could forget about
> this space-recycling improvement and go back to separate temp files.
> The objection to that, of course, is that while sorting might be faster,
> it doesn't matter how fast the algorithm is if you don't have the disk
> space to execute it.


Look what I found. I downloaded Linux kernel source for 2.2.0, and
started looking for the word 'ahead' in the file system files.  I found
that read-ahead seems to be controlled by f_reada, and look where I
found it being turned off?  Seems like any seek turns off read-ahead on
Linux.

When you do a read or write, it seems to be turned on again.  Once you
read/write, the next read/write will do read-ahead, assuming you don't
do any lseek() before the second read/write().

Seems like the algorithm in psort now is rarely having read-ahead on
Linux, while other OS's check to see if the read-ahead was eventually
used, and control read-ahead that way.

read-head also seems be off on the first read from a file.

---------------------------------------------------------------------------

/**  linux/fs/ext2/file.c
...
/** Make sure the offset never goes beyond the 32-bit mark..*/
static long long ext2_file_lseek(struct file *file,long long offset,int origin)
{struct inode *inode = file->f_dentry->d_inode;
switch (origin) {    case 2:        offset += inode->i_size;        break;    case 1:        offset += file->f_pos;}if
(((unsignedlong long) offset >> 32) != 0) {
 
#if BITS_PER_LONG < 64    return -EINVAL;
#else    if (offset > ext2_max_sizes[EXT2_BLOCK_SIZE_BITS(inode->i_sb)])        return -EINVAL;
#endif} if (offset != file->f_pos) {    file->f_pos = offset;    file->f_reada = 0;    file->f_version =
++event;}returnoffset;
 
}


--  Bruce Momjian                        |  http://www.op.net/~candle maillist@candle.pha.pa.us            |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026
 


pgsql-hackers by date:

Previous
From: Karel Zak - Zakkr
Date:
Subject: Re: [HACKERS] Get OID of just inserted record
Next
From: Bruce Momjian
Date:
Subject: Re: [HACKERS] sort on huge table