Re: [PERFORM] Unlogged tables - Mailing list pgsql-performance

From Michael Paquier
Subject Re: [PERFORM] Unlogged tables
Date
Msg-id CAB7nPqTaPvASbn3Sqi3J5c=QnCrZS3VNqFG9OHKqLt8Su36VmQ@mail.gmail.com
Whole thread Raw
In response to [PERFORM] Unlogged tables  ("ldh@laurent-hasson.com" <ldh@laurent-hasson.com>)
Responses Re: [PERFORM] Unlogged tables
List pgsql-performance
On Wed, Aug 9, 2017 at 5:20 AM, ldh@laurent-hasson.com
<ldh@laurent-hasson.com> wrote:
> We have a fairly large static dataset that we load into Postgres. We made
> the tables UNLOGGED and saw a pretty significant performance improvement for
> the loading. This was all fantastic until the server crashed and we were
> surprised to see during a follow up demo that the data had disappeared... Of
> course, it's all our fault for not understanding the implications of
> UNLOGGED proprely.

This is documented.

> However, our scenario is truly a set of tables with 100's of millions of
> rows that are effectively WORMs: we write them once only, and then only read
> from them afterwards. As such, they could not be possibly corrupted
> post-load (i think) during a server crash (short of physical disk
> defects...).
>
> I'd like to have the performance improvement during a initial batch insert,
> and then make sure the table remains after "unclean" shutdowns, which, as
> you might have it, includes a regular Windows server shut down during
> patching for example. So unlogged tables in practice are pretty flimsy.

All the data that you want to keep needs to be durable anyway, so you
will need to WAL-log it, and full page writes of those relation pages
will need to be created at least once. After you get past the
checkpoint the data will still be around. If you want to improve the
performance once, there are a couple of tricks, like switching
wal_level to minimal, preferring COPY over multi-value INSERT, batch a
lot of them in the same transaction. Of course you can as well
increase wal_max_size to trigger less checkpoints, or use
synchronous_commit = off to reduce fsync costs.

> I tried to ALTER ... SET LOGGED, but that takes a VERY long time and pretty
> much negates the initial performance boost of loading into an unlogged
> table.

This triggers a table rewrite and makes sure that all the data gets
WAL-logged. The cost to pay for durability.

> Is there a way to get my cake and eat it too?

Not completely. Making data durable will have a cost at the end, but
you can leverage it.
--
Michael


pgsql-performance by date:

Previous
From: "ldh@laurent-hasson.com"
Date:
Subject: [PERFORM] Unlogged tables
Next
From: "David G. Johnston"
Date:
Subject: Re: [PERFORM] Unlogged tables