Re: [HACKERS] Re: [QUESTIONS] Does Storage Manager support >2GB tables? - Mailing list pgsql-hackers

From dg@illustra.com (David Gould)
Subject Re: [HACKERS] Re: [QUESTIONS] Does Storage Manager support >2GB tables?
Date
Msg-id 9803122035.AA28177@hawk.illustra.com
Whole thread Raw
In response to Re: [HACKERS] Re: [QUESTIONS] Does Storage Manager support >2GB tables?  (Bruce Momjian <maillist@candle.pha.pa.us>)
Responses Re: [HACKERS] Re: [QUESTIONS] Does Storage Manager support >2GB tables?  (Bruce Momjian <maillist@candle.pha.pa.us>)
List pgsql-hackers
Bruce Momjian writes:
> Yes, I have heard that the standard file system read-ahead is often
> useless for a database, so on a raw partition you know the next block
> that is going to be requested, so you can prefetch there rather than
> having the file system prefetch the next sequential block.

At least on the systems I am intimately familiar with, the prefetch that the
OS does (assuming a modern OS like Linux) is pretty hard to beat. If you have
a table that was bulk loaded in key order, a sequential scan is going to
result in a sequential access pattern to the underlying file and the OS
prefetch does the right thing. If you have an unindexed table with rows
inserted at the end, the OS prefetch still works. If you are using a secondary
index on some sort of chopped up table with rows inserted willy-nilly, it
then, it may be worth doing async reads in a burst and let the disk request
sort make the best of it.

As far as I am aware, Postgres does not do async I/O. Perhaps it should.

> Also nice so you can control what gets written to disk/fsync'ed and what doesn't
> get fsync'ed.

This is really the big win.

> Our idea is to control when pg_log gets written to disk.  We keep active
> pg_log pages in shared memory, and every 30-60 seconds, we make a memory
> copy of the current pg_log active pages, do a system sync() (which
> happens anyway at that interval), update the pg_log file with the saved
> changes, and fsync() the pg_log pages to disk.  That way, after a crash,
> the current database only shows transactions as committed where we are
> sure all the data has made it to disk.

OK as far as it goes, but probably bad for concurrancy if I have understood
you.

> I have a more detailed posting if you are interested.

Thanks, I will read it. Probably should hold more comments until after that ;-)

-dg

David Gould            dg@illustra.com           510.628.3783 or 510.305.9468
Informix Software  (No, really)         300 Lakeside Drive  Oakland, CA 94612
 - I realize now that irony has no place in business communications.


pgsql-hackers by date:

Previous
From: Bruce Momjian
Date:
Subject: Re: [HACKERS] Re: [QUESTIONS] Does Storage Manager support >2GB tables?
Next
From: Hal Snyder
Date:
Subject: SCO vs. the monster macro