James Bottomley <James.Bottomley@HansenPartnership.com> wrote:
> you mean the order of write out, if we have to do it, is
> important. In the rest of the kernel, we do this with barriers
> which causes ordered grouping of I/O chunks. If we could force a
> similar ordering in the writeout code, is that enough?
Unless it can be between particular pairs of pages, I don't think
performance could be at all acceptable. Each data page has an
associated Log Sequence Number reflecting the last Write-Ahead Log
record which records a change to that page, and the referenced WAL
record must be safely persisted before the data page is allowed to
be written. Currently, when we need to write a dirty page to the
OS, we must ensure that the WAL record is written and fsync'd
first. We also write a WAL record for transaction command and
fsync it at each COMMIT, before telling the client that the COMMIT
request was successful. (Well, at least by default; they can
choose to set synchronous_commit to off for some or all
transactions.) If a write barrier to control this applied to
everything on the filesystem, performance would be horrible.
--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company