Re: adding support for posix_fadvise() - Mailing list pgsql-hackers
From | Tom Lane |
---|---|
Subject | Re: adding support for posix_fadvise() |
Date | |
Msg-id | 13972.1067870303@sss.pgh.pa.us Whole thread Raw |
In response to | adding support for posix_fadvise() (Neil Conway <neilc@samurai.com>) |
Responses |
Re: adding support for posix_fadvise()
|
List | pgsql-hackers |
Neil Conway <neilc@samurai.com> writes: > So what API is desirable for uses 2-4? I'm thinking of adding a new > function to the smgr API, smgradvise(). It's a little premature to be inventing APIs when you have no evidence that this will make any useful performance difference. I'd recommend a quick hack to get proof of concept before you bother with nice APIs. > Given a Relation and an advice, this would: > (a) propagate the advice for this relation to all the open FDs for the > relation "All"? You cannot affect the FDs being used by other backends. It's fairly unclear to me what the posix_fadvise function is really going to do for files that are being accessed by multiple processes. For instance, is there any value in setting POSIX_FADV_DONTNEED on a WAL file, given that every other backend is going to have that same file open? I would expect that rational kernel behavior would be to ignore this advice unless it's set by the last backend to have the file open --- but I'm not sure we can synchronize the closing of old WAL segments well enough to know which backend is the last to close the file. A related problem is that the smgr uses the same FD to access the same relation no matter how many scans are in progress. Think about a complex query that is doing both a seqscan and an indexscan on the same relation (a self-join could easily do this). You'd really need to change this if you want POSIX_FADV_SEQUENTIAL and POSIX_FADV_RANDOM to get set usefully. In short I think you need to do some more thinking about what the scope of the advice flags is going to be ... > (b) store the new advice somewhere so that new FDs for the relation can > have this advice set for them: clients should just be able to call > smgradvise() without needing to worry if someone else has already called > smgropen() for the relation in the past. One problem is how to store > this: I don't think it can be a field of RelationData, since that is > transient. Any suggestions? Something Vadim had wanted to do for years is to decouple the smgr and lower levels from the existing Relation cache, and have a low-level notion of "open relation" that only requires having the "RelFileNode" value to open it. This would allow eliminating the concept of blind write, which would be a Very Good Thing. It would make sense to associate the advice setting with such low-level relations. One possible way to handle the multiple-scan issue is to make the desired advice part of the low-level open() call, so that you actually have different low-level relations for seq and random access to a relation. Not sure if this works cleanly when you take into account issues like smgrunlink, but it's something to think about. regards, tom lane
pgsql-hackers by date: