Re: GSoC proposal - "make an unlogged table logged" - Mailing list pgsql-hackers

From Robert Haas
Subject Re: GSoC proposal - "make an unlogged table logged"
Date
Msg-id CA+TgmoZPer6CXmujPx2YU4y6rn4JXE1rVWFHYm089fJ9Jvs33Q@mail.gmail.com
Whole thread Raw
In response to Re: GSoC proposal - "make an unlogged table logged"  (Andres Freund <andres@2ndquadrant.com>)
Responses Re: GSoC proposal - "make an unlogged table logged"
List pgsql-hackers
On Tue, Mar 4, 2014 at 9:50 AM, Andres Freund <andres@2ndquadrant.com> wrote:
> On 2014-03-04 09:47:08 -0500, Robert Haas wrote:
>> On Mon, Mar 3, 2014 at 12:08 PM, Stephen Frost <sfrost@snowman.net> wrote:
>> > * Robert Haas (robertmhaas@gmail.com) wrote:
>> >> On Mon, Mar 3, 2014 at 11:28 AM, Fabrízio de Royes Mello
>> >> <fabriziomello@gmail.com> wrote:
>> >> > Is the TODO item "make an unlogged table logged" [1] a good GSoC project?
>> >>
>> >> I'm pretty sure we found some problems in that design that we couldn't
>> >> figure out how to solve.  I don't have a pointer to the relevant
>> >> -hackers discussion off-hand, but I think there was one.
>> >
>> > ISTR the discussion going something along the lines of "we'd have to WAL
>> > log the entire table to do that, and if we have to do that, what's the
>> > point?".
>>
>> No, not really.  The issue is more around what happens if we crash
>> part way through.  At crash recovery time, the system catalogs are not
>> available, because the database isn't consistent yet and, anyway, the
>> startup process can't be bound to a database, let alone every database
>> that might contain unlogged tables.  So the sentinel that's used to
>> decide whether to flush the contents of a table or index is the
>> presence or absence of an _init fork, which the startup process
>> obviously can see just fine.  The _init fork also tells us what to
>> stick in the relation when we reset it; for a table, we can just reset
>> to an empty file, but that's not legal for indexes, so the _init fork
>> contains a pre-initialized empty index that we can just copy over.
>>
>> Now, to make an unlogged table logged, you've got to at some stage
>> remove those _init forks.  But this is not a transactional operation.
>> If you remove the _init forks and then the transaction rolls back,
>> you've left the system an inconsistent state.  If you postpone the
>> removal until commit time, then you have a problem if it fails,
>> particularly if it works for the first file but fails for the second.
>> And if you crash at any point before you've fsync'd the containing
>> directory, you have no idea which files will still be on disk after a
>> hard reboot.
>
> Can't that be solved by just creating the permanent relation in a new
> relfilenode? That's equivalent to a rewrite, yes, but we need to do that
> for anything but wal_level=minimal anyway.

Yes, that would work.  I've tended to view optimizing away the
relfilenode copy as an indispensable part of this work, but that might
be wrongheaded.  It would certainly be a lot easier to make this
happen if we didn't insist on that.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



pgsql-hackers by date:

Previous
From: Josh Berkus
Date:
Subject: Re: jsonb and nested hstore
Next
From: Alvaro Herrera
Date:
Subject: Re: Fwd: patch: make_timestamp function