Bruce Momjian wrote:
> Kevin Brown wrote:
> > Bruce Momjian wrote:
> > > The idea of using this on Unix is tempting, but Tatsuo is using a
> > > threaded backend, so it is a little easier to do. However, it would
> > > probably be pretty easy to write a file of modified file names that the
> > > checkpoint could read and open/fsync/close.
> >
> > Even that's not strictly necessary -- we *do* have shared memory we
> > can use for this, and even when hundreds of tables have been written
> > the list will only end up being a few tens of kilobytes in size (plus
> > whatever overhead is required to track and manipulate the entries).
> >
> > But even then, we don't actually have to track the *names* of the
> > files that have changed, just their RelFileNodes, since there's a
> > mapping function from the RelFileNode to the filename.
>
> But we have to allow an unlimited number of files. Perhaps we could
> just fall back to sync if the shared memory overflows, and shared memory
> is finite.
True.
Hmm...perhaps there's another way to do this? Let me explain:
When we do a checkpoint what we're really doing is writing any
committed transactions in the transaction log to the associated data
files, right?
Or, so the theory goes. PG may do something quite different than
that. I'm not terribly familiar with the source and so it may be no
surprise that I'm having difficulty finding any code that converts
transactions stored in the transaction log into changes to the data
files...
Anyway, if a checkpoint really does take transactions and commit them
to the data files, then the transactions themselves contain all the
information we need. So there would be no need to maintain a separate
list: the list has already been stored on disk for us. All we'd have
to do is build a list at checkpoint time and fsync/fdatasync each file
in the list at the very end. The list wouldn't need to be shared
because the only process that would care is the one doing the
checkpointing.
Or so the theory goes. Since I'm having so much trouble finding code
that actually does any of what I describe, I'd have no trouble
believing that how PG works is very different than I envision...
--
Kevin Brown kevin@sysexperts.com