Re: Survey on backing up unlogged tables: help us with PostgreSQL development! - Mailing list pgsql-general

Man, the number of misunderstandings in this thread is staggering.
Let me try to explain what the proposed feature will and will not do.

1. The system catalog entries for all tables will be wal-logged.
So schema (DDL) will survive a crash.  There wouldn't be any way
to make it not do that, because we can't wal-log only some updates
to a particular table, and that includes the catalogs in particular.

2. What's proposed as the new feature is that specific non-system
tables can be marked as unlogged, meaning that WAL entries won't
be made for changes in those tables' contents (nor their indexes'
contents).  So we can't guarantee that the contents of such tables
will be correct or consistent after a crash.  The proposed feature
deals with this by forcibly truncating all such tables after a crash,
thus ensuring that they're consistent though not populated.  So the
possible use-cases for such tables are limited to where (a) you can
repopulate the tables on demand, or (b) you don't really care about
losing data on a crash.

3. There's a lot of wishful thinking here about what constitutes a
crash.  A backend crash *is* a crash, even if the postmaster keeps
going.  Data that had been in shared buffers doesn't get written out
in such a scenario (and if we tried, it might be corrupt anyway).  So
unlogged tables would be corrupt and in need of truncation after such an
event.  Obviously, the same goes for an OS-level crash or power failure.

4. The last bit of discussion on -hackers concerned what to do in
the case where the server got shut down cleanly.  If it was shut
down cleanly, then any data for unlogged tables would have been
written out from shared buffers ... but did the data make it to disk?
There's no easy way to know that.  In the event of an OS crash or
power failure shortly after server shutdown, it's possible that
the unlogged tables would be corrupt.  So Robert's initial proposal
includes truncating unlogged tables at any database startup, even
if the previous shutdown was clean.  Some (including me) are arguing
that that is unnecessarily strict; but you do have to realize that
you're taking some risk with data validity if it doesn't do that.

The bottom line here is that you really can only use the feature
for data that you're willing to accept losing on no notice.
Allowing the data to persist across clean shutdowns would probably
improve usability a bit, but it's not changing that fundamental fact.

            regards, tom lane

pgsql-general by date:

Previous
From: Tom Lane
Date:
Subject: Re: strange row count estimates with conditions on multiple column
Next
From: Tomas Vondra
Date:
Subject: Re: strange row count estimates with conditions on multiple column