Re: Disable WAL logging to speed up data loading - Mailing list pgsql-hackers
From | Stephen Frost |
---|---|
Subject | Re: Disable WAL logging to speed up data loading |
Date | |
Msg-id | 20201109173620.GI16415@tamriel.snowman.net Whole thread Raw |
In response to | Re: Disable WAL logging to speed up data loading ("David G. Johnston" <david.g.johnston@gmail.com>) |
Responses |
Re: Disable WAL logging to speed up data loading
|
List | pgsql-hackers |
Greetings, * David G. Johnston (david.g.johnston@gmail.com) wrote: > On Mon, Nov 9, 2020 at 8:18 AM Stephen Frost <sfrost@snowman.net> wrote: > > Presently, my feeling is that we could address this use-case without > > having to introduce a new cluster-wide WAL level, and that's the > > direction I'd want to see this going. Perhaps I'm missing something > > about why the approach I've set forth above wouldn't work, and > > wal_level=none would, but I've not seen it yet. > > +1 > > We are trying to address a performance optimization for an insert-only > scenario on a limited set of tables by placing the entire cluster in a > dangerous state. The "copy table unlogged" solution is definitely closer > to what we want - this is > demonstrably worse. Yeah, agreed. > For this case the fundamental feature that would seem to be required is an > ability for a transaction commit to return only after the system has > ensured that all of the new pages added to the relation during the scope of > the transaction have made it to disk. Something like: > > BEGIN UNLOGGED TRANSACTION FOR table1, table2; > -- locking probably allows reads, definitely disallows concurrent writes, > to the named tables > -- Disallow updates and deletes, do not use dead tuple space, for the > tables named. Should be able to do normal stuff for other tables? > -- Always create new pages > COPY TO table1; > COPY TO table2; > COMMIT; -- wait here until data files for table1 and table2 are completely > written and the transaction alive flag is committed to the WAL. That's certainly an interesting idea, but seems like a much larger step than just making some improvements to how UNLOGGED tables work today, and then perhaps some helper options to make it easier to create UNLOGGED tables and change them from unlogged to logged when the wal_level is set to 'minimal'. Also- I don't think this would end up working for normally logged relations at a wal_level higher than 'minimal', since if we don't log those pages then they won't get to replicas. > I suppose the above could be written "BEGIN UNLOGGED TRANSACTION FOR ALL > TABLES" and you'd get the initial database population optimization > capability. Or just 'BEGIN UNLOGGED TRANSACTION'.. I wonder if we'd have to run around and lock all tables as you're suggesting above or if we could just lock them as they get used.. > If the commit doesn't complete all of the newly created pages are junk. > Otherwise, you have a crash-recoverable state for those tables as regards > those specific pages. How would we track that and know which pages are junk? > Conceptually, we need an ability to perform a partial CHECKPOINT that names > specific tables, and make sure the crash-recovery works for those tables > while figuring out what amount of effort to expend on informing the dba and > alerting/preventing features that require wal from using those tables. Yeah, seems pretty complicated. Did you see an issue with the basic idea I proposed earlier, whereby an unlogged table could become 'logged', while we are at wal_level=minimal, by essentially checkpointing it (locking it, forcing out any buffers we have associated with it, and then fsync'ing it- not sure how much of that is already done in the unlogged->logged process but I would guess most of it) while not actually writing it into the WAL? Thanks, Stephen
Attachment
pgsql-hackers by date: