Re: Caching by Postgres - Mailing list pgsql-performance

From Gavin Sherry
Subject Re: Caching by Postgres
Date
Msg-id Pine.LNX.4.58.0508241425080.5141@linuxworld.com.au
Whole thread Raw
In response to Re: Caching by Postgres  (PFC <lists@boutiquenumerique.com>)
Responses Re: Caching by Postgres
List pgsql-performance
On Wed, 24 Aug 2005, PFC wrote:

>
> > Josh Berkus has already mentioned this as conventional wisdom as written
> > by Oracle. This may also be legacy wisdom. Oracle/Sybase/etc has been
> > around for a long time; it was probably a clear performance win way back
> > when. Nowadays with how far open-source OS's have advanced, I'd take it
> > with a grain of salt and do my own performance analysis. I suspect the
> > big vendors wouldn't change their stance even if they knew it was no
> > longer true due to the support hassles.
>
>     Reinvent a filesystem... that would be suicidal.
>
>     Now, Hans Reiser has expressed interest on the ReiserFS list in tweaking
> his Reiser4 especially for Postgres. In his own words, he wants a "Killer
> app for reiser4". Reiser4 will offser transactional semantics via a
> special reiser4 syscall, so it might be possible, with a minimum of
> changes to postgres (ie maybe just another sync mode besides fsync,
> fdatasync et al) to use this. Other interesting details were exposed on
> the reiser list, too (ie. a transactional filesystems can give ACID
> guarantees to postgres without the need for fsync()).
>
>     Very interesting.

Ummm... I don't see anything here which will be a win for Postgres. The
transactional semantics we're interested in are fairly complex:

1) Modifications to multiple objects can become visible to the system
atomically
2) On error, a series of modifications which had been grouped together
within a transaction can be rolled back
3) Using object version information, determine which version of which
object is visible to a given session
4) Using version information and locking, detect and resolve read/write
and write/write conflicts

Now, I can see a file system offering (1) and (2). But a file system that
can allow people to do (3) and (4) would require that we make *major*
modifications to how postgresql is implemented. More over, it would be for
no gain, since we've already written a system which can do it.

A filesystem could, in theory, help us by providing an API which allows us
to tell the file system either: the way we'd like it to read ahead, the
fact that we don't want it to read ahead or the way we'd like it to cache
(or not cache) data. The thing is, most OSes provide interfaces to do this
already and we make only little use of them (I'm think of
madv_sequential(), madv_random(), POSIX fadvise(), the various flags to
open() which AIX, HPUX, Solaris provide).

Gavin

pgsql-performance by date:

Previous
From: Alan Stange
Date:
Subject: Re: Read/Write block sizes
Next
From: "Jeffrey W. Baker"
Date:
Subject: Re: Read/Write block sizes