Re: Disable WAL logging to speed up data loading - Mailing list pgsql-hackers
From | Stephen Frost |
---|---|
Subject | Re: Disable WAL logging to speed up data loading |
Date | |
Msg-id | 20201109192641.GK16415@tamriel.snowman.net Whole thread Raw |
In response to | Re: Disable WAL logging to speed up data loading ("David G. Johnston" <david.g.johnston@gmail.com>) |
List | pgsql-hackers |
Greetings, * David G. Johnston (david.g.johnston@gmail.com) wrote: > On Mon, Nov 9, 2020 at 10:36 AM Stephen Frost <sfrost@snowman.net> wrote: > > * David G. Johnston (david.g.johnston@gmail.com) wrote: > > > If the commit doesn't complete all of the newly created pages are junk. > > > Otherwise, you have a crash-recoverable state for those tables as regards > > > those specific pages. > > > > How would we track that and know which pages are junk? > > Every one of those pages would have a single dead transaction id contained > within it. If there is bookkeeping that needs to happen that could be wal > logged - the goal here being not to avoid all wal but to avoid data wal. I > don't know enough about the internals here to be more specific. Yeah, not sure how well that would end up working.. Maybe there's a way to get there, but definitely a fair bit of complicated work. > > > Conceptually, we need an ability to perform a partial CHECKPOINT that > > names > > > specific tables, and make sure the crash-recovery works for those tables > > > while figuring out what amount of effort to expend on informing the dba > > and > > > alerting/preventing features that require wal from using those tables. > > > > Yeah, seems pretty complicated. > > > > Did you see an issue with the basic idea I proposed earlier, whereby an > > unlogged table could become 'logged', while we are at wal_level=minimal, > > by essentially checkpointing it (locking it, forcing out any buffers we > > have associated with it, and then fsync'ing it- not sure how much of > > that is already done in the unlogged->logged process but I would guess > > most of it) while not actually writing it into the WAL? > > That is basically half of what is described above - the part at commit when > the relation is persisted to disk. Right, think I agree with you there. > What your earlier description seems to > be missing is the part about temporarily making a logged relation > unlogged. While I get that there may be use-cases for that, it seems that the use-case being described here doesn't require it- there just needs to be a way to take the unlogged table and make it into a 'logged' table without having to actually write it all into the WAL when wal_level is minimal. There may be another option to addressing the use-case as you're looking at it though- by using partitioning. That is, there's a partitioned table which has logged tables in it, a new unlogged table is created and added to the partitioned table, it's then loaded, and then it's converted to being logged (or maybe it's loaded first and then added to the partitioned table, either way). This would also have the advantage that you'd be able to continue making changes to the partitioned table as you normally do, in general. > I envision that as being part of a transaction as opposed to a > permanent attribute of the table. I envision a storage parameter that > allows individual relations to be considered as having wal_level='minimal' > even if the system as a whole has, e.g., wal_level='replication'. Only > those could be forced into this temporarily unlogged mode. I mean ... we have a way of saying that individual relations have a lower WAL level than others- they're UNLOGGED. I'm still seeing this as an opportunity to build on that and improve that, rather than invent something new. Thanks, Stephen
Attachment
pgsql-hackers by date: