On Tue, Mar 4, 2014 at 11:50 AM, Andres Freund <
andres@2ndquadrant.com> wrote:
>
> On 2014-03-04 09:47:08 -0500, Robert Haas wrote:
> > On Mon, Mar 3, 2014 at 12:08 PM, Stephen Frost <
sfrost@snowman.net> wrote:
> > > * Robert Haas (
robertmhaas@gmail.com) wrote:
> > >> On Mon, Mar 3, 2014 at 11:28 AM, Fabrízio de Royes Mello
> > >> <
fabriziomello@gmail.com> wrote:
> > >> > Is the TODO item "make an unlogged table logged" [1] a good GSoC project?
> > >>
> > >> I'm pretty sure we found some problems in that design that we couldn't
> > >> figure out how to solve. I don't have a pointer to the relevant
> > >> -hackers discussion off-hand, but I think there was one.
> > >
> > > ISTR the discussion going something along the lines of "we'd have to WAL
> > > log the entire table to do that, and if we have to do that, what's the
> > > point?".
> >
> > No, not really. The issue is more around what happens if we crash
> > part way through. At crash recovery time, the system catalogs are not
> > available, because the database isn't consistent yet and, anyway, the
> > startup process can't be bound to a database, let alone every database
> > that might contain unlogged tables. So the sentinel that's used to
> > decide whether to flush the contents of a table or index is the
> > presence or absence of an _init fork, which the startup process
> > obviously can see just fine. The _init fork also tells us what to
> > stick in the relation when we reset it; for a table, we can just reset
> > to an empty file, but that's not legal for indexes, so the _init fork
> > contains a pre-initialized empty index that we can just copy over.
> >
> > Now, to make an unlogged table logged, you've got to at some stage
> > remove those _init forks. But this is not a transactional operation.
> > If you remove the _init forks and then the transaction rolls back,
> > you've left the system an inconsistent state. If you postpone the
> > removal until commit time, then you have a problem if it fails,
> > particularly if it works for the first file but fails for the second.
> > And if you crash at any point before you've fsync'd the containing
> > directory, you have no idea which files will still be on disk after a
> > hard reboot.
>
> Can't that be solved by just creating the permanent relation in a new
> relfilenode? That's equivalent to a rewrite, yes, but we need to do that
> for anything but wal_level=minimal anyway.
>