Re: Disable WAL logging to speed up data loading - Mailing list pgsql-hackers

From Stephen Frost
Subject Re: Disable WAL logging to speed up data loading
Date
Msg-id 20201109192641.GK16415@tamriel.snowman.net
Whole thread Raw
In response to Re: Disable WAL logging to speed up data loading  ("David G. Johnston" <david.g.johnston@gmail.com>)
List pgsql-hackers
Greetings,

* David G. Johnston (david.g.johnston@gmail.com) wrote:
> On Mon, Nov 9, 2020 at 10:36 AM Stephen Frost <sfrost@snowman.net> wrote:
> > * David G. Johnston (david.g.johnston@gmail.com) wrote:
> > > If the commit doesn't complete all of the newly created pages are junk.
> > > Otherwise, you have a crash-recoverable state for those tables as regards
> > > those specific pages.
> >
> > How would we track that and know which pages are junk?
>
> Every one of those pages would have a single dead transaction id contained
> within it.  If there is bookkeeping that needs to happen that could be wal
> logged - the goal here being not to avoid all wal but to avoid data wal.  I
> don't know enough about the internals here to be more specific.

Yeah, not sure how well that would end up working..  Maybe there's a way
to get there, but definitely a fair bit of complicated work.

> > > Conceptually, we need an ability to perform a partial CHECKPOINT that
> > names
> > > specific tables, and make sure the crash-recovery works for those tables
> > > while figuring out what amount of effort to expend on informing the dba
> > and
> > > alerting/preventing features that require wal from using those tables.
> >
> > Yeah, seems pretty complicated.
> >
> > Did you see an issue with the basic idea I proposed earlier, whereby an
> > unlogged table could become 'logged', while we are at wal_level=minimal,
> > by essentially checkpointing it (locking it, forcing out any buffers we
> > have associated with it, and then fsync'ing it- not sure how much of
> > that is already done in the unlogged->logged process but I would guess
> > most of it) while not actually writing it into the WAL?
>
> That is basically half of what is described above - the part at commit when
> the relation is persisted to disk.

Right, think I agree with you there.

> What your earlier description seems to
> be missing is the part about temporarily making a logged relation
> unlogged.

While I get that there may be use-cases for that, it seems that the
use-case being described here doesn't require it- there just needs to be
a way to take the unlogged table and make it into a 'logged' table
without having to actually write it all into the WAL when wal_level is
minimal.

There may be another option to addressing the use-case as you're looking
at it though- by using partitioning.  That is, there's a partitioned
table which has logged tables in it, a new unlogged table is created and
added to the partitioned table, it's then loaded, and then it's
converted to being logged (or maybe it's loaded first and then added to
the partitioned table, either way).  This would also have the advantage
that you'd be able to continue making changes to the partitioned table
as you normally do, in general.

> I envision that as being part of a transaction as opposed to a
> permanent attribute of the table.  I envision a storage parameter that
> allows individual relations to be considered as having wal_level='minimal'
> even if the system as a whole has, e.g., wal_level='replication'.  Only
> those could be forced into this temporarily unlogged mode.

I mean ... we have a way of saying that individual relations have a
lower WAL level than others- they're UNLOGGED.  I'm still seeing this as
an opportunity to build on that and improve that, rather than invent
something new.

Thanks,

Stephen

Attachment

pgsql-hackers by date:

Previous
From: John Naylor
Date:
Subject: Re: WIP: BRIN multi-range indexes
Next
From: Anastasia Lubennikova
Date:
Subject: Re: Skip ExecCheckRTPerms in CTAS with no data