Re: unlogged tables - Mailing list pgsql-performance

From Jeff Janes
Subject Re: unlogged tables
Date
Msg-id CAMkU=1z_QRgLcGd1YTHcScbL4GFBcmPsuPxw92qy6HRXtHiscQ@mail.gmail.com
Whole thread Raw
In response to Re: unlogged tables  ("David G. Johnston" <david.g.johnston@gmail.com>)
Responses Re: unlogged tables
Re: unlogged tables
List pgsql-performance
On Mon, Apr 13, 2015 at 1:49 PM, David G. Johnston <david.g.johnston@gmail.com> wrote:
On Monday, April 13, 2015, Matheus de Oliveira <matioli.matheus@gmail.com> wrote:

On Mon, Apr 13, 2015 at 4:31 PM, dgabriel <gabriel.dodan@gmail.com> wrote:
"In the event of a normal shutdown, we can flush all the writes to disk
so we know all the data has been written, so there is no need to truncate."

Isn't possible to periodically flush data to disk and in case of crush
postgres to load only the data that existed at last flush? The periodic
flush could be configurable, for example every 30 minutes or after x rows
updated/inserted.

There is no such facility implemented for UNLOGGED TABLEs. That could be a feature request though.

One way would be to lock dirty buffers from unlogged relations into shared_buffers (which hardly seems like a good thing) until the start of a "super-checkpoint" and then write them all out as fast as possible (which kind of defeats checkpoint_completion_target).  And then if the crash happened during a super-checkpoint, the data would still be inconsistent and need to be truncated.
 


Well, that is half right anyway.  UNLOGGED tables obey checkpoints just like any other table. 

Do they?  I thought they only obeyed shutdown checkpoints, not online checkpoints.  I do remember some changes around this area, but none that completely reverted that logic.

 
The missing feature is an option to leaved restored the last checkpoint.  Instead, not knowing whether there were changes since the last checkpoint, the system truncated the relation.

What use case is there for a behavior that the last checkpoint data is left on the relation upon restarting - not knowing whether it was possible the other data could have been written subsequent?

I would like a way to have unlogged tables be available on a replica provided that no changes were made to them between the pg_basebackup and the recovery point.

My use case is that I mark certain read-only-after-bulk-loading tables as unlogged solely to avoid blowing out the log archive during the loading phase and refresh phase.  This is stuff like vendor catalogs, NCBI datasets, ChEMBL datasets, etc, which can simply be re-derived from the reference.  It would be nice if these were still available (without having to repeat the ETL) after crashes provided they were not written to since a checkpoint, and available on cloned test servers without having to repeat the ETL on those as well.
 
As for "maybe its corrupt, maybe it isn't, but lets keep them anyway", yeah, I have little use for that.

Cheers,

Jeff

pgsql-performance by date:

Previous
From: Jim Nasby
Date:
Subject: Re: unlogged tables
Next
From: "David G. Johnston"
Date:
Subject: Re: unlogged tables