Re: Disable WAL logging to speed up data loading - Mailing list pgsql-hackers
From | Stephen Frost |
---|---|
Subject | Re: Disable WAL logging to speed up data loading |
Date | |
Msg-id | 20201109151807.GG16415@tamriel.snowman.net Whole thread Raw |
In response to | RE: Disable WAL logging to speed up data loading ("osumi.takamichi@fujitsu.com" <osumi.takamichi@fujitsu.com>) |
Responses |
Re: Disable WAL logging to speed up data loading
Re: Disable WAL logging to speed up data loading |
List | pgsql-hackers |
Greetings, * osumi.takamichi@fujitsu.com (osumi.takamichi@fujitsu.com) wrote: > On Tuesday, Nov 3, 2020 3:02 AM Stephen Frost <sfrost@snowman.net> wrote: > > I'm not sure that wal_level=none is really the right way to address this > > use-case. We already have unlogged tables and that's pretty clean and > > meets the "we want to load data fast without having to pay for WAL" use case. > > The argument here seems to be that to take advantage of unlogged tables > > requires the tools using PG to know how to issue a 'CREATE UNLOGGED > > TABLE' command instead of a 'CREATE TABLE' command. That doesn't > > seem like a huge leap, but we could make it easier by just adding a > > 'table_log_default' or such GUC that could be set on the data loading role to > > have all tables created by it be unlogged. > I'm afraid to say that in the case to setup all tables as unlogged, > the user are forced to be under tension to > back up *all* commands from application, in preparation for unexpected crash. > This is because whenever the server crashes, > the unlogged tables are truncated and the DBA needs to > input the processings after the last backup again without exception. > I didn't think that this was easy and satisfied the user. You'll need to explain how this is different from the proposed 'wal_level = none' option, since it sounded like that would be exactly the same case..? > In addition, as long as the tables are unlogged, the user cannot be released from > this condition or (requirement ?) to back up all commands or > to guarantee that all commands are repeatable for the DBA. They can change the table to be logged though, if they wish to. > When I consider the use case is the system of data warehouse > as described upthread, the size of each table can be large. > Thus, changing the status from unlogged to logged (recoverable) > takes much time under the current circumstances, which was discussed before. Ok- so the issue is that, today, we dump all of the table into the WAL when we go from unlogged to logged, but as I outlined previously, perhaps that's because we're missing a trick there when wal_level=minimal. If wal_level=minimal, then it would seem like we could lock the table, then sync it and then mark is as logged, which is more-or-less what you're asking to have be effectively done with the proposed wal_level=none, but this would be an optimization for all existing users of wal_level=minimal who have unlogged tables that they want to change to logged, and this works on a per-table basis instead, which seems like a better approach than a cluster-wide setting. > By having the limited window of time, > during wal_level=none, I'd like to make wal_level=none work to > localize and minimize the burden to guarantee all commands are > repeatable. To achieve this, after switching wal_level from none to higher ones, > the patch must ensure crash recovery, though. Perhaps a helper command could be added to ALTER TABLE ALL IN TABLESPACE to marked a bunch of unlogged tables over to being logged would be good to add too. > Sorry that my current patch doesn't complete this aspect fully at present > but, may I have your opinion about this ? Presently, my feeling is that we could address this use-case without having to introduce a new cluster-wide WAL level, and that's the direction I'd want to see this going. Perhaps I'm missing something about why the approach I've set forth above wouldn't work, and wal_level=none would, but I've not seen it yet. Thanks, Stephen
Attachment
pgsql-hackers by date: