Re: [PERFORM] Unlogged tables - Mailing list pgsql-performance

From Stephen Frost
Subject Re: [PERFORM] Unlogged tables
Date
Msg-id 20170809151226.GK4628@tamriel.snowman.net
Whole thread Raw
In response to Re: [PERFORM] Unlogged tables  ("David G. Johnston" <david.g.johnston@gmail.com>)
List pgsql-performance
David, all,

* David G. Johnston (david.g.johnston@gmail.com) wrote:
> On Wed, Aug 9, 2017 at 3:39 AM, Michael Paquier <michael.paquier@gmail.com>
> wrote:
>
> > This triggers a table rewrite and makes sure that all the data gets
> > WAL-logged. The cost to pay for durability.

That's not entirely accurate- there are certain cases where we don't
have to WAL-log the data, in fact we've got a specific optimization to
avoid WAL logging when it isn't necessary (see
src/backend/commands/copy.c:2392 or so), and the data will still be
durable once the transaction commits.  There are limitations there
though, of course, but it sounds like those are ones the OP may be happy
to live with in this case.

> > > Is there a way to get my cake and eat it too?
> >
> > Not completely. Making data durable will have a cost at the end, but
> > you can leverage it.
>
> Aren't you over-playing the role of the WAL in providing durability.  An
> unlogged table remains intact after a clean shutdown and so is "durable" if
> one considers the primary "permanence" aspect of the word.

In database terms, however, durable is intended to be in the face of a
crash and not just a clean shutdown, otherwise we wouldn't need to bother
with this whole WAL thing at all.

> The trade-off the OP wishes for is "lose crash-safety to gain write-once
> (to the data files) performance".  Seeming having this on a per-table basis
> would be part of the desirability.  It sounds like OP would be willing to
> place the table into "read only" mode in order to ensure this - which is
> something that is not presently possible.  I could envision that putting an
> unlogged table into read-only mode would cause the system to ensure that
> the data files are fully populated and then set a flag in the catalog that
> informs the crash recovery process to go ahead and omit truncating that
> particular unlogged table since the data files are known to be accurate.

This does sound like a pretty interesting idea, though not really
necessary unless OP has a mix of data that needs to be WAL-log'd and
data that doesn't.

What I believe OP is really looking for here, specifically, is using
wal_level = minimal while creating the table (or truncating it) within
the same transaction as the data load is done.  That will avoid having
the table's contents written into the WAL, and PG will treat it as a
regular table post-commit, meaning that it won't be truncated on a
database crash.

Thanks!

Stephen

Attachment

pgsql-performance by date:

Previous
From: "David G. Johnston"
Date:
Subject: Re: [PERFORM] Unlogged tables
Next
From: Jeff Janes
Date:
Subject: Re: [PERFORM] Unlogged tables