Re: Disable WAL logging to speed up data loading - Mailing list pgsql-hackers

From Stephen Frost
Subject Re: Disable WAL logging to speed up data loading
Date
Msg-id 20201109151807.GG16415@tamriel.snowman.net
Whole thread Raw
In response to RE: Disable WAL logging to speed up data loading  ("osumi.takamichi@fujitsu.com" <osumi.takamichi@fujitsu.com>)
Responses Re: Disable WAL logging to speed up data loading  ("David G. Johnston" <david.g.johnston@gmail.com>)
Re: Disable WAL logging to speed up data loading  (Kyotaro Horiguchi <horikyota.ntt@gmail.com>)
List pgsql-hackers
Greetings,

* osumi.takamichi@fujitsu.com (osumi.takamichi@fujitsu.com) wrote:
> On Tuesday, Nov 3, 2020 3:02 AM Stephen Frost <sfrost@snowman.net> wrote:
> > I'm not sure that wal_level=none is really the right way to address this
> > use-case.  We already have unlogged tables and that's pretty clean and
> > meets the "we want to load data fast without having to pay for WAL" use case.
> > The argument here seems to be that to take advantage of unlogged tables
> > requires the tools using PG to know how to issue a 'CREATE UNLOGGED
> > TABLE' command instead of a 'CREATE TABLE' command.  That doesn't
> > seem like a huge leap, but we could make it easier by just adding a
> > 'table_log_default' or such GUC that could be set on the data loading role to
> > have all tables created by it be unlogged.
> I'm afraid to say that in the case to setup all tables as unlogged,
> the user are forced to be under tension to
> back up *all* commands from application, in preparation for unexpected crash.
> This is because whenever the server crashes,
> the unlogged tables are truncated and the DBA needs to
> input the processings after the last backup again without exception.
> I didn't think that this was easy and satisfied the user.

You'll need to explain how this is different from the proposed
'wal_level = none' option, since it sounded like that would be exactly
the same case..?

> In addition, as long as the tables are unlogged, the user cannot be released from
> this condition or (requirement ?) to back up all commands or
> to guarantee that all commands are repeatable for the DBA.

They can change the table to be logged though, if they wish to.

> When I consider the use case is the system of data warehouse
> as described upthread, the size of each table can be large.
> Thus, changing the status from unlogged to logged (recoverable)
> takes much time under the current circumstances, which was discussed before.

Ok- so the issue is that, today, we dump all of the table into the WAL
when we go from unlogged to logged, but as I outlined previously,
perhaps that's because we're missing a trick there when
wal_level=minimal.  If wal_level=minimal, then it would seem like we
could lock the table, then sync it and then mark is as logged, which is
more-or-less what you're asking to have be effectively done with the
proposed wal_level=none, but this would be an optimization for all
existing users of wal_level=minimal who have unlogged tables that they
want to change to logged, and this works on a per-table basis instead,
which seems like a better approach than a cluster-wide setting.

> By having the limited window of time,
> during wal_level=none, I'd like to make wal_level=none work to
> localize and minimize the burden to guarantee all commands are
> repeatable. To achieve this, after switching wal_level from none to higher ones,
> the patch must ensure crash recovery, though.

Perhaps a helper command could be added to ALTER TABLE ALL IN TABLESPACE
to marked a bunch of unlogged tables over to being logged would be good
to add too.

> Sorry that my current patch doesn't complete this aspect fully at present
> but, may I have your opinion about this ?

Presently, my feeling is that we could address this use-case without
having to introduce a new cluster-wide WAL level, and that's the
direction I'd want to see this going.  Perhaps I'm missing something
about why the approach I've set forth above wouldn't work, and
wal_level=none would, but I've not seen it yet.

Thanks,

Stephen

Attachment

pgsql-hackers by date:

Previous
From: Alvaro Herrera
Date:
Subject: Re: Prevent printing "next step instructions" in initdb and pg_upgrade
Next
From: Tom Lane
Date:
Subject: Re: -Wformat-signedness