RE: Disable WAL logging to speed up data loading - Mailing list pgsql-hackers
From | osumi.takamichi@fujitsu.com |
---|---|
Subject | RE: Disable WAL logging to speed up data loading |
Date | |
Msg-id | OSBPR01MB4888672EC3E30BECB28DD0E3EDDC0@OSBPR01MB4888.jpnprd01.prod.outlook.com Whole thread Raw |
In response to | Re: Disable WAL logging to speed up data loading (Michael Paquier <michael@paquier.xyz>) |
List | pgsql-hackers |
Hi, Michael Thank you for your attention to this thread. On Friday, December 25, 2020 4:09 PM Michael Paquier <michael@paquier.xyz> wrote: > On Thu, Dec 03, 2020 at 03:52:47AM +0000, tsunakawa.takay@fujitsu.com > wrote: > > The code looks good, and the performance seems to be nice, so I marked > > this ready for committer. > > FWIW, I am extremely afraid of this proposal because this is basically a > footgun able to corrupt customer instances, and I am ready to bet that people > would switch wal_level to none because they see a lot of benefits in doing so > at first sight, until the host running the server is plugged off and they need to > use pg_resetwal in urgency to bring an instance online. In the patch, I added plenty of descriptions (especially cautions) to the documents. At least, when users notice the existence of "none" parameter for wal_level, they cannot avoid having a look at such cautions. Accordingly, it's really impossible that they see only the merits. In terms of pg_resetwal, it should not happen that they use it to utilize the server corrupted by failure of any operation during wal_level=none. I documented clearly this wal_level is designed to make the server never start up again when any unexpected crash is detected and thus users need to recreate the whole cluster again. > Users have already > the option to make things go bad, just by disabling full page writes or fsync, > but I really don't think that we should put in their hands more options able to > break instances, nor should we try to spend more efforts in having more > "protections" that would trigger only once the instance is already fried. > > Perhaps this is something that Horiguchi-san pointed out upthread in [1] > (last sentence of first paragraph), but did you consider that it is already > possible to do bulk-loading with a minimal amount of WAL generated as long > as you do the COPY within the transaction that created the table? Quoting > the docs in [2]: > "COPY is fastest when used within the same transaction as an earlier > CREATE TABLE or TRUNCATE command. In such cases no WAL needs to be > written, because in case of an error, the files containing the newly loaded data > will be removed anyway. However, this consideration only applies when > wal_level is minimal as all commands must write WAL otherwise." > > Upgrade scenarios have been mentioned in this case as being a pain when it > comes to take advantage of this optimization. Wouldn't it be safer if we took > a client approach instead, where restores are able to load the data with a > cheap succession of commands by loading the data in a transaction done > after a TRUNCATE? In a data warehouse environment, the environment of our scenario in this thread in mind, I think that to truncate target table beforehand in the same transaction is not always realistic solution. Imagine an appended data loading case to execute bulk data load into one specific table with a bunch of records. Best Regards, Takamichi Osumi
pgsql-hackers by date: