Home > mailing lists

Re: Disable WAL logging to speed up data loading - Mailing list pgsql-hackers

From	David G. Johnston
Subject	Re: Disable WAL logging to speed up data loading
Date	November 9, 2020 18:20:40
Msg-id	CAKFQuwbunE8QONrbvWsV7Vk33bOccyPkOeG+-7_CkeA9g3GOSQ@mail.gmail.com Whole thread Raw
In response to	Re: Disable WAL logging to speed up data loading (Stephen Frost <sfrost@snowman.net>)
Responses	Re: Disable WAL logging to speed up data loading
List	pgsql-hackers

Tree view

On Mon, Nov 9, 2020 at 10:36 AM Stephen Frost <sfrost@snowman.net> wrote:

* David G. Johnston (david.g.johnston@gmail.com) wrote:

> If the commit doesn't complete all of the newly created pages are junk.
> Otherwise, you have a crash-recoverable state for those tables as regards
> those specific pages.

How would we track that and know which pages are junk?

Every one of those pages would have a single dead transaction id contained within it. If there is bookkeeping that needs to happen that could be wal logged - the goal here being not to avoid all wal but to avoid data wal. I don't know enough about the internals here to be more specific.

> Conceptually, we need an ability to perform a partial CHECKPOINT that names
> specific tables, and make sure the crash-recovery works for those tables
> while figuring out what amount of effort to expend on informing the dba and
> alerting/preventing features that require wal from using those tables.

Yeah, seems pretty complicated.

Did you see an issue with the basic idea I proposed earlier, whereby an
unlogged table could become 'logged', while we are at wal_level=minimal,
by essentially checkpointing it (locking it, forcing out any buffers we
have associated with it, and then fsync'ing it- not sure how much of
that is already done in the unlogged->logged process but I would guess
most of it) while not actually writing it into the WAL?

That is basically half of what is described above - the part at commit when the relation is persisted to disk. What your earlier description seems to be missing is the part about temporarily making a logged relation unlogged. I envision that as being part of a transaction as opposed to a permanent attribute of the table. I envision a storage parameter that allows individual relations to be considered as having wal_level='minimal' even if the system as a whole has, e.g., wal_level='replication'. Only those could be forced into this temporarily unlogged mode.

One part I hadn't given thought to is indexes and how those would interact with this plan. Mostly due to lack of internals knowledge.

David J.

pgsql-hackers by date:

From: Tomas Vondra
Date: 09 November 2020, 18:19:59
Subject: Re: automatic analyze: readahead - add "IO read time" log message

From: Peter Geoghegan
Date: 09 November 2020, 18:47:39
Subject: Re: RE: Delaying/avoiding BTreeTupleGetNAtts() call within _bt_compare()

Re: Disable WAL logging to speed up data loading - Mailing list pgsql-hackers

Previous

Next