Re: unlogged tables - Mailing list pgsql-performance

From Jim Nasby
Subject Re: unlogged tables
Date
Msg-id 552C7F41.6070101@BlueTreble.com
Whole thread Raw
In response to Re: unlogged tables  ("David G. Johnston" <david.g.johnston@gmail.com>)
Responses Re: unlogged tables
List pgsql-performance
On 4/13/15 7:32 PM, David G. Johnston wrote:
>         The missing feature is an option to leaved restored the last
>         checkpoint.  Instead, not knowing whether there were changes
>         since the last checkpoint, the system truncated the relation.
>
>         What use case is there for a behavior that the last checkpoint
>         data is left on the relation upon restarting - not knowing
>         whether it was possible the other data could have been written
>         subsequent?
>
>
>     I would like a way to have unlogged tables be available on a replica
>     provided that no changes were made to them between the pg_basebackup
>     and the recovery point.
>
>
>     My use case is that I mark certain read-only-after-bulk-loading
>     tables as unlogged solely to avoid blowing out the log archive
>     during the loading phase and refresh phase.  This is stuff like
>     vendor catalogs, NCBI datasets, ChEMBL datasets, etc, which can
>     simply be re-derived from the reference.  It would be nice if these
>     were still available (without having to repeat the ETL) after
>     crashes provided they were not written to since a checkpoint, and
>     available on cloned test servers without having to repeat the ETL on
>     those as well.
>
>
> ​My gut reaction is that those should be in their own clusters and
> accessed via postgres_fdw...

Likely to produce really crappy plans if the tables are of any real size...

> That particular use-case would probably best be served with a separate
> replication channel which pushes data files from the primary to the
> slaves and allows for the slave to basically "rewrite" its existing
> table by pointing to the newly supplied version.  Some kind of "CREATE
> STATIC TABLE" and "PUSH STATIC TABLE TO {all | replica name}" command
> combo...though ideally with less manual intervention...

You still have the same problem of knowing if someone has scribbled on
the data since the last checkpoint.

There's been recent discussion of adding support for read-only tables.
If we had those, we might be able to support something like...

INSERT INTO unlogged;
ALTER TABLE unlogged SET READ ONLY;
CHECKPOINT;
/* take backup */

This should be safe as long as we WAL log changes to read-only status
(which presumably we would).

How much work that would entail though, I don't know.

Ultimately you still have to get the data over to the other machine
anyway. ISTM it'd be a LOT more useful to look at ways to make the WAL
logging of bulk inserts (and especially COPY into a known empty table) a
lot more efficient.
--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com


pgsql-performance by date:

Previous
From: "David G. Johnston"
Date:
Subject: Re: unlogged tables
Next
From: "David G. Johnston"
Date:
Subject: Re: unlogged tables