Disable WAL logging to speed up data loading - Mailing list pgsql-hackers

From tsunakawa.takay@fujitsu.com
Subject Disable WAL logging to speed up data loading
Date
Msg-id TYAPR01MB29901EBE5A3ACCE55BA99186FE320@TYAPR01MB2990.jpnprd01.prod.outlook.com
Whole thread Raw
Responses Re: Disable WAL logging to speed up data loading  (Amul Sul <sulamul@gmail.com>)
Re: Disable WAL logging to speed up data loading  (Ashutosh Bapat <ashutosh.bapat.oss@gmail.com>)
List pgsql-hackers
Hello,


We'd like to propose a feature to disable WAL to speed up data loading.  This was inspired by a feature added in the
latestMySQL.  I wish you won't fear this feature... 


BACKGROUND
========================================

This branches off from [1] as mentioned therein.  Briefly speaking, a customer wants to shorten the time for nightly
loadingof data into their data warehouse as much as possible to be prepared for using the data warehouse for new
things.

Currently, they are using Oracle's SQL*Loader with its no-logging feature.  They want a similar feature to migrate to
Postgres. Other than the data loading performance, they don't want to be concerned about the storage for large volumes
ofWAL. 

In [1], we thought about something like Oracle's per-table no-logging feature, but it seems difficult (or at least not
easy.) Meanwhile, I found another feature added in the latest MySQL 8.0.21 [2].  This proposal follows it almost
directly. That satisfies the customer request. 

As an aside, it's also conceivable that in the near future, users could see the WAL bottleneck (WAL buffer or disk)
whenthey utilize the parallel COPY that is being developed in the community. 


FUNCTIONAL SPECIFICATION
========================================

Add a new value 'none' to the server configuration parameter wal_level.  With this setting:

* No WAL is emitted.

* The server refuses to start (pg_ctl start fails) after an abnormal shutdown due to power outage, pg_ctl's immediate
shutdown,etc, showing a straightforward message like MySQL. 

* Features like continuous archiving, pg_basebackup, and streaming/logical replication that requires wal_level >=
replicaare not available. 

* The user can use all features again if you shut down the server successfully after data loading and reset wal_level
toa value other than none.  He needs to take a base backup or rebuild the replication standby after restarting the
server.


In addition to the cosmetic modifications to the manual articles that refer to wal_level, add a clause or paragraphs to
thefollowing sections to let users know the availability of this feature. 

14.4. Populating a Database
18.6.1. Upgrading Data via pg_dumpall


PROGRAM DESIGN (main point only)
========================================

As in the bootstrap mode (during initdb), when wal_level = none, XLogInsert() does nothing and just returns a fixed
value,which is the tail of the last shutdown checkpoint WAL record.  As a result, the value is set to the relation page
header'sLSN field. 

In addition, it might be worth having XLogBeginInsert() and XLogRec...() to check wal_level and just return.  I don't
expectmuch from this, but it may be interesting to give it a try to see the squeezed performance. 

StartupXLOG() checks the wal_level setting in pg_control and quits the startup with ereport(FATAL) accordingly.


[1]
Implement UNLOGGED clause for COPY FROM
https://www.postgresql.org/message-id/OSBPR01MB488887C0BDC5129C65DFC5E5ED640@OSBPR01MB4888.jpnprd01.prod.outlook.com

[2]
Disabling Redo Logging
https://dev.mysql.com/doc/refman/8.0/en/innodb-redo-log.html#innodb-disable-redo-logging


Regards
Takayuki Tsunakawa




pgsql-hackers by date:

Previous
From: Thomas Munro
Date:
Subject: Re: Assertion failure with barriers in parallel hash join
Next
From: Hamid Akhtar
Date:
Subject: Re: Improved Cost Calculation for IndexOnlyScan