Home > mailing lists

Re: [HACKERS] TODO item - Mailing list pgsql-hackers

From	Bruce Momjian
Subject	Re: [HACKERS] TODO item
Date	February 7, 2000 15:39:17
Msg-id	200002071735.MAA08496@candle.pha.pa.us Whole thread Raw
In response to	Re: [HACKERS] TODO item (Tom Lane <tgl@sss.pgh.pa.us>)
Responses	fsync alternatives (was: Re: [HACKERS] TODO item) (Alfred Perlstein <bright@wintelcom.net>) Re: [HACKERS] TODO item (Tom Lane <tgl@sss.pgh.pa.us>) RE: [HACKERS] TODO item ("Hiroshi Inoue" <Inoue@tpf.co.jp>)
List	pgsql-hackers

Tree view

> Bruce Momjian <pgman@candle.pha.pa.us> writes:
> > Don't tell me we fsync on every buffer write, and not just at
> > transaction commit?  That is terrible.
> 
> If you don't have -F set, yup.  Why did you think fsync mode was
> so slow?
> 
> > What if we set a flag on the file descriptor stating we dirtied/wrote
> > one of its buffers during the transaction, and cycle through the file
> > descriptors on buffer commit and fsync all involved in the transaction. 
> 
> That's exactly what Tatsuo was describing, I believe.  I think Hiroshi
> has pointed out a serious problem that would make it unreliable when
> multiple backends are running: if some *other* backend fwrites the page
> instead of your backend, and it doesn't fsync until *its* transaction is
> done (possibly long after yours), then you lose the ordering guarantee
> that is the point of the whole exercise...

OK, I understand now.  You are saying if my backend dirties a buffer,
but another backend does the write, would my backend fsync() that buffer
that the other backend wrote.

I can't imagine how fsync could flush _only_ the file discriptor buffers
modified by the current process.  It would have to affect all buffers
for the file descriptor.

BSDI says:
    Fsync() causes all modified data and attributes of fd to be moved to a    permanent storage device.  This normally
resultsin all in-core modified    copies of buffers for the associated file to be written to a disk.

Looking at the BSDI kernel, there is a user-mode file descriptor table,
which maps to a kernel file descriptor table.  This table can be shared,
so a file descriptor opened multiple times, like in a fork() call.  The
kernel table maps to an actual file inode/vnode that maps to a file. 
The only thing that is kept in the file descriptor table is the current
offset in the file (struct file in BSD).  There is no mapping of who
wrote which blocks.

In fact, I would suggest that any kernel implementation that could track
such things would be pretty broken.  I can imagine some cases the use of
that mapping of blocks to file descriptors would cause compatibility
problems.  Those buffers have to be shared by all processes.

So, I think we are safe if we can either keep that file descriptor open
until commit, or re-open it and fsync it on commit.  That assume a
re-open is hitting the same file.  My opinion is that we should just
fsync it on close and not worry about a reopen.

--  Bruce Momjian                        |  http://www.op.net/~candle pgman@candle.pha.pa.us               |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026

pgsql-hackers by date:

From: Bruce Momjian
Date: 07 February 2000, 15:37:13
Subject: Re: [HACKERS] New Globe

From: Alfred Perlstein
Date: 07 February 2000, 16:14:13
Subject: fsync alternatives (was: Re: [HACKERS] TODO item)

Re: [HACKERS] TODO item - Mailing list pgsql-hackers

Previous

Next