RE: Disable WAL logging to speed up data loading - Mailing list pgsql-hackers

From osumi.takamichi@fujitsu.com
Subject RE: Disable WAL logging to speed up data loading
Date
Msg-id OSBPR01MB4888672EC3E30BECB28DD0E3EDDC0@OSBPR01MB4888.jpnprd01.prod.outlook.com
Whole thread Raw
In response to Re: Disable WAL logging to speed up data loading  (Michael Paquier <michael@paquier.xyz>)
List pgsql-hackers
Hi, Michael


Thank you for your attention to this thread.
On Friday, December 25, 2020 4:09 PM  Michael Paquier <michael@paquier.xyz> wrote:
> On Thu, Dec 03, 2020 at 03:52:47AM +0000, tsunakawa.takay@fujitsu.com
> wrote:
> > The code looks good, and the performance seems to be nice, so I marked
> > this ready for committer.
>
> FWIW, I am extremely afraid of this proposal because this is basically a
> footgun able to corrupt customer instances, and I am ready to bet that people
> would switch wal_level to none because they see a lot of benefits in doing so
> at first sight, until the host running the server is plugged off and they need to
> use pg_resetwal in urgency to bring an instance online.
In the patch, I added plenty of descriptions (especially cautions) to the documents.
At least, when users notice the existence of "none" parameter for wal_level,
they cannot avoid having a look at such cautions.
Accordingly, it's really impossible that they see only the merits.

In terms of pg_resetwal,
it should not happen that they use it to utilize the server
corrupted by failure of any operation during wal_level=none.

I documented clearly
this wal_level is designed to make
the server never start up again when any unexpected crash is detected
and thus users need to recreate the whole cluster again.

> Users have already
> the option to make things go bad, just by disabling full page writes or fsync,
> but I really don't think that we should put in their hands more options able to
> break instances, nor should we try to spend more efforts in having more
> "protections" that would trigger only once the instance is already fried.
>
> Perhaps this is something that Horiguchi-san pointed out upthread in [1]
> (last sentence of first paragraph), but did you consider that it is already
> possible to do bulk-loading with a minimal amount of WAL generated as long
> as you do the COPY within the transaction that created the table?  Quoting
> the docs in [2]:
> "COPY is fastest when used within the same transaction as an earlier
> CREATE TABLE or TRUNCATE command. In such cases no WAL needs to be
> written, because in case of an error, the files containing the newly loaded data
> will be removed anyway. However, this consideration only applies when
> wal_level is minimal as all commands must write WAL otherwise."
>
> Upgrade scenarios have been mentioned in this case as being a pain when it
> comes to take advantage of this optimization.  Wouldn't it be safer if we took
> a client approach instead, where restores are able to load the data with a
> cheap succession of commands by loading the data in a transaction done
> after a TRUNCATE?
In a data warehouse environment,
the environment of our scenario in this thread in mind,
I think that to truncate target table
beforehand in the same transaction is not always realistic solution.
Imagine an appended data loading case to execute bulk data load
into one specific table with a bunch of records.

Best Regards,
    Takamichi Osumi



pgsql-hackers by date:

Previous
From: Magnus Hagander
Date:
Subject: Re: Commit fest manager for 2021-01
Next
From: Masahiro Ikeda
Date:
Subject: Re: Add session statistics to pg_stat_database