Home > mailing lists

Re: adding support for posix_fadvise() - Mailing list pgsql-hackers

From	Tom Lane
Subject	Re: adding support for posix_fadvise()
Date	November 3, 2003 13:39:12
Msg-id	13972.1067870303@sss.pgh.pa.us Whole thread Raw
In response to	adding support for posix_fadvise() (Neil Conway <neilc@samurai.com>)
Responses	Re: adding support for posix_fadvise()
List	pgsql-hackers

Tree view

Neil Conway <neilc@samurai.com> writes:
> So what API is desirable for uses 2-4? I'm thinking of adding a new
> function to the smgr API, smgradvise().

It's a little premature to be inventing APIs when you have no evidence
that this will make any useful performance difference.  I'd recommend a
quick hack to get proof of concept before you bother with nice APIs.

> Given a Relation and an advice, this would:
> (a) propagate the advice for this relation to all the open FDs for the
> relation

"All"?  You cannot affect the FDs being used by other backends.  It's
fairly unclear to me what the posix_fadvise function is really going
to do for files that are being accessed by multiple processes.  For
instance, is there any value in setting POSIX_FADV_DONTNEED on a WAL
file, given that every other backend is going to have that same file
open?  I would expect that rational kernel behavior would be to ignore
this advice unless it's set by the last backend to have the file open
--- but I'm not sure we can synchronize the closing of old WAL segments
well enough to know which backend is the last to close the file.

A related problem is that the smgr uses the same FD to access the same
relation no matter how many scans are in progress.  Think about a
complex query that is doing both a seqscan and an indexscan on the same
relation (a self-join could easily do this).  You'd really need to
change this if you want POSIX_FADV_SEQUENTIAL and POSIX_FADV_RANDOM to
get set usefully.

In short I think you need to do some more thinking about what the scope
of the advice flags is going to be ...

> (b) store the new advice somewhere so that new FDs for the relation can
> have this advice set for them: clients should just be able to call
> smgradvise() without needing to worry if someone else has already called
> smgropen() for the relation in the past. One problem is how to store
> this: I don't think it can be a field of RelationData, since that is
> transient. Any suggestions?

Something Vadim had wanted to do for years is to decouple the smgr and
lower levels from the existing Relation cache, and have a low-level
notion of "open relation" that only requires having the "RelFileNode"
value to open it.  This would allow eliminating the concept of blind
write, which would be a Very Good Thing.  It would make sense to
associate the advice setting with such low-level relations.  One
possible way to handle the multiple-scan issue is to make the desired
advice part of the low-level open() call, so that you actually have
different low-level relations for seq and random access to a relation.
Not sure if this works cleanly when you take into account issues like
smgrunlink, but it's something to think about.
        regards, tom lane

pgsql-hackers by date:

From: Andrew Sullivan
Date: 03 November 2003, 13:17:09
Subject: Re: adding support for posix_fadvise()

From: Andrew Sullivan
Date: 03 November 2003, 13:49:14
Subject: Re: Experimental patch for inter-page delay in VACUUM

Re: adding support for posix_fadvise() - Mailing list pgsql-hackers

Previous

Next