Thread: Please help me understand unlogged tables
If a table is set to unlogged is it inherently non-durable? That, is any crash or unsafe shutdown _must_ result in truncation upon recovery?
I can imagine a table that is bulk loaded in a warehousing scenario and then sitting statically could be safe, but maybe the question becomes how could the system know it is unchanged if it isn't logged...
Would a correct solution be to bulk load into an unlogged table and then (fairly cheaply?) set the table back to being logged?
If I my mental model is correct the table would be liable to being lost to a crash during loading but once set back to logged it is durable. Is that correct?
Is it the case that setting a large table to logged status after loading it unlogged is a reasonably cheap operation?
Thank you,
Alex Stoddard
On Wed, Jan 31, 2018 at 8:52 AM, Alexander Stoddard <alexander.stoddard@gmail.com> wrote:
If a table is set to unlogged is it inherently non-durable? That, is any crash or unsafe shutdown _must_ result in truncation upon recovery?
Yes.
I can imagine a table that is bulk loaded in a warehousing scenario and then sitting statically could be safe, but maybe the question becomes how could the system know it is unchanged if it isn't logged...
Right, that is the problem. It would be nice to solve it, but there are no immediate plans that I know of to implement that. It is hard, because during recovery the system can't query system catalog tables to get information out of them. That is why unlogged tables are indicate by the existence of certain files in the filesystem, so the system can know they are unlogged without querying the system catalogs to find out.
Would a correct solution be to bulk load into an unlogged table and then (fairly cheaply?) set the table back to being logged?
Unfortunately, it is not cheap for large tables.
I have some large datasets which I could always regenerate if needed, which I don't want blowing out my wal archive. I've resigned myself to just reloading them after a crash (or after using pg_basebackup to reclone production to make a fresh test/dev server). If the ETL step from the raw source is expensive, then I \copy the table out to a file, and then \copy it back in after a crash, to save on the ET time.
Cheers,
Jeff
On Wed, Jan 31, 2018 at 10:52:38AM -0600, Alexander Stoddard wrote: > If a table is set to unlogged is it inherently non-durable? That, is any > crash or unsafe shutdown _must_ result in truncation upon recovery? Yes, they are designed like that. Upon recovery all unlogged tables are re-initialized. What happens internally is that their init fork files gets copied to become the one in use. > I can imagine a table that is bulk loaded in a warehousing scenario and > then sitting statically could be safe, but maybe the question becomes how > could the system know it is unchanged if it isn't logged... > > Would a correct solution be to bulk load into an unlogged table and then > (fairly cheaply?) set the table back to being logged? Note that switching a table from unlogged to logged using ALTER TABLE generates a large amount of WAL, as you need to make this table's data consistent it is necessary to copy a full image of it. > If I my mental model is correct the table would be liable to being lost to > a crash during loading but once set back to logged it is durable. Is that > correct? Yes, ALTER TABLE makes that possible but be careful about the WAL generated because of the table rewrite. > Is it the case that setting a large table to logged status after loading it > unlogged is a reasonably cheap operation? If you load a logged table the cost would be the same as switching it from unlogged to logged. So what matters is how much operation is generated between the time you created the table, like random writes on it. For an initial load on a very large table, you could reduce wal_level temporarily to minimal, and make the WAL generated less painful. For a one-time load this can matter. -- Michael