Re: unlogged tables - Mailing list pgsql-performance

From David G. Johnston
Subject Re: unlogged tables
Date
Msg-id CAKFQuwZhXngoQ9xG9N0fvQYVTvXQoO4cO8r-ZYHN+xLWiJEqCQ@mail.gmail.com
Whole thread Raw
In response to Re: unlogged tables  (Jeff Janes <jeff.janes@gmail.com>)
Responses Re: unlogged tables  (Jim Nasby <Jim.Nasby@BlueTreble.com>)
List pgsql-performance
On Mon, Apr 13, 2015 at 4:49 PM, Jeff Janes <jeff.janes@gmail.com> wrote:
On Mon, Apr 13, 2015 at 1:49 PM, David G. Johnston <david.g.johnston@gmail.com> wrote:
On Monday, April 13, 2015, Matheus de Oliveira <matioli.matheus@gmail.com> wrote:

On Mon, Apr 13, 2015 at 4:31 PM, dgabriel <gabriel.dodan@gmail.com> wrote:
"In the event of a normal shutdown, we can flush all the writes to disk
so we know all the data has been written, so there is no need to truncate."

Isn't possible to periodically flush data to disk and in case of crush
postgres to load only the data that existed at last flush? The periodic
flush could be configurable, for example every 30 minutes or after x rows
updated/inserted.

There is no such facility implemented for UNLOGGED TABLEs. That could be a feature request though.

One way would be to lock dirty buffers from unlogged relations into shared_buffers (which hardly seems like a good thing) until the start of a "super-checkpoint" and then write them all out as fast as possible (which kind of defeats checkpoint_completion_target).  And then if the crash happened during a super-checkpoint, the data would still be inconsistent and need to be truncated.
 


Well, that is half right anyway.  UNLOGGED tables obey checkpoints just like any other table. 

Do they?  I thought they only obeyed shutdown checkpoints, not online checkpoints.  I do remember some changes around this area, but none that completely reverted that logic.


​I vaguely recall that conversation now...I'm not positive on the exact mechanics here and, as it pertains to the OP, the difference you describe is immaterial since in either case the status quo mandates an "all or nothing" approach to an unlogged table's contents.​

 
 
The missing feature is an option to leaved restored the last checkpoint.  Instead, not knowing whether there were changes since the last checkpoint, the system truncated the relation.

What use case is there for a behavior that the last checkpoint data is left on the relation upon restarting - not knowing whether it was possible the other data could have been written subsequent?

I would like a way to have unlogged tables be available on a replica provided that no changes were made to them between the pg_basebackup and the recovery point.

My use case is that I mark certain read-only-after-bulk-loading tables as unlogged solely to avoid blowing out the log archive during the loading phase and refresh phase.  This is stuff like vendor catalogs, NCBI datasets, ChEMBL datasets, etc, which can simply be re-derived from the reference.  It would be nice if these were still available (without having to repeat the ETL) after crashes provided they were not written to since a checkpoint, and available on cloned test servers without having to repeat the ETL on those as well.
 

​My gut reaction is that those should be in their own clusters and accessed via postgres_fdw...

That particular use-case would probably best be served with a separate replication channel which pushes data files from the primary to the slaves and allows for the slave to basically "rewrite" its existing table by pointing to the newly supplied version.  Some kind of "CREATE STATIC TABLE" and "PUSH STATIC TABLE TO {all | replica name}" command combo...though ideally with less manual intervention...

David J.​
 

pgsql-performance by date:

Previous
From: Jeff Janes
Date:
Subject: Re: unlogged tables
Next
From: Jim Nasby
Date:
Subject: Re: unlogged tables