Re: Sequential Scan Read-Ahead - Mailing list pgsql-hackers

From Bruce Momjian
Subject Re: Sequential Scan Read-Ahead
Date
Msg-id 200204250156.g3P1ufh05751@candle.pha.pa.us
Whole thread Raw
In response to Re: Sequential Scan Read-Ahead  (Curt Sampson <cjs@cynic.net>)
Responses Re: Sequential Scan Read-Ahead  (Curt Sampson <cjs@cynic.net>)
List pgsql-hackers
Curt Sampson wrote:
> On Wed, 24 Apr 2002, Bruce Momjian wrote:
> 
> > We expect the file system to do re-aheads during a sequential scan.
> > This will not happen if someone else is also reading buffers from that
> > table in another place.
> 
> Right. The essential difficulties are, as I see it:
> 
>     1. Not all systems do readahead.

If they don't, that isn't our problem.  We expect it to be there, and if
it isn't, the vendor/kernel is at fault.

>     2. Even systems that do do it cannot always reliably detect that
>     they need to.

Yes, seek() in file will turn off read-ahead.  Grabbing bigger chunks
would help here, but if you have two people already reading from the
same file, grabbing bigger chunks of the file may not be optimal.

>     3. Even when the read-ahead does occur, you're still doing more
>     syscalls, and thus more expensive kernel/userland transitions, than
>     you have to.

I would guess the performance impact is minimal.

> Has anybody considered writing a storage manager that uses raw
> partitions and deals with its own buffer caching? This has the potential
> to be a lot more efficient, since the database server knows much more
> about its workload than the operating system can guess.

We have talked about it, but rejected it.  Look in TODO.detail in
optimizer and performance for 'raw'.  Also interesting info there about
optimizer cost estimates we have been talking about.

Specificially see:
http://candle.pha.pa.us/mhonarc/todo.detail/performance/msg00009.html

Also see:
http://candle.pha.pa.us/mhonarc/todo.detail/optimizer/msg00011.html

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026
 


pgsql-hackers by date:

Previous
From: Hiroshi Inoue
Date:
Subject: Re: Vote on SET in aborted transaction
Next
From: Bruce Momjian
Date:
Subject: Re: Vote on SET in aborted transaction