Thread: Disable WAL logging to speed up data loading

Disable WAL logging to speed up data loading

From
"tsunakawa.takay@fujitsu.com"
Date:
Hello,


We'd like to propose a feature to disable WAL to speed up data loading.  This was inspired by a feature added in the
latestMySQL.  I wish you won't fear this feature... 


BACKGROUND
========================================

This branches off from [1] as mentioned therein.  Briefly speaking, a customer wants to shorten the time for nightly
loadingof data into their data warehouse as much as possible to be prepared for using the data warehouse for new
things.

Currently, they are using Oracle's SQL*Loader with its no-logging feature.  They want a similar feature to migrate to
Postgres. Other than the data loading performance, they don't want to be concerned about the storage for large volumes
ofWAL. 

In [1], we thought about something like Oracle's per-table no-logging feature, but it seems difficult (or at least not
easy.) Meanwhile, I found another feature added in the latest MySQL 8.0.21 [2].  This proposal follows it almost
directly. That satisfies the customer request. 

As an aside, it's also conceivable that in the near future, users could see the WAL bottleneck (WAL buffer or disk)
whenthey utilize the parallel COPY that is being developed in the community. 


FUNCTIONAL SPECIFICATION
========================================

Add a new value 'none' to the server configuration parameter wal_level.  With this setting:

* No WAL is emitted.

* The server refuses to start (pg_ctl start fails) after an abnormal shutdown due to power outage, pg_ctl's immediate
shutdown,etc, showing a straightforward message like MySQL. 

* Features like continuous archiving, pg_basebackup, and streaming/logical replication that requires wal_level >=
replicaare not available. 

* The user can use all features again if you shut down the server successfully after data loading and reset wal_level
toa value other than none.  He needs to take a base backup or rebuild the replication standby after restarting the
server.


In addition to the cosmetic modifications to the manual articles that refer to wal_level, add a clause or paragraphs to
thefollowing sections to let users know the availability of this feature. 

14.4. Populating a Database
18.6.1. Upgrading Data via pg_dumpall


PROGRAM DESIGN (main point only)
========================================

As in the bootstrap mode (during initdb), when wal_level = none, XLogInsert() does nothing and just returns a fixed
value,which is the tail of the last shutdown checkpoint WAL record.  As a result, the value is set to the relation page
header'sLSN field. 

In addition, it might be worth having XLogBeginInsert() and XLogRec...() to check wal_level and just return.  I don't
expectmuch from this, but it may be interesting to give it a try to see the squeezed performance. 

StartupXLOG() checks the wal_level setting in pg_control and quits the startup with ereport(FATAL) accordingly.


[1]
Implement UNLOGGED clause for COPY FROM
https://www.postgresql.org/message-id/OSBPR01MB488887C0BDC5129C65DFC5E5ED640@OSBPR01MB4888.jpnprd01.prod.outlook.com

[2]
Disabling Redo Logging
https://dev.mysql.com/doc/refman/8.0/en/innodb-redo-log.html#innodb-disable-redo-logging


Regards
Takayuki Tsunakawa




Re: Disable WAL logging to speed up data loading

From
Amul Sul
Date:
On Tue, Sep 29, 2020 at 1:58 PM tsunakawa.takay@fujitsu.com
<tsunakawa.takay@fujitsu.com> wrote:
>
> Hello,
>
>
> We'd like to propose a feature to disable WAL to speed up data loading.  This was inspired by a feature added in the
latestMySQL.  I wish you won't fear this feature...
 
>

TWIMW, pg_bulkload contrib module[1], also does the same for the
faster data loading.

Regards,
Amul

1] http://ossc-db.github.io/pg_bulkload/pg_bulkload.html



Re: Disable WAL logging to speed up data loading

From
Ashutosh Bapat
Date:
Can they use a database with all unlogged tables?

On Tue, Sep 29, 2020 at 1:58 PM tsunakawa.takay@fujitsu.com
<tsunakawa.takay@fujitsu.com> wrote:
>
> Hello,
>
>
> We'd like to propose a feature to disable WAL to speed up data loading.  This was inspired by a feature added in the
latestMySQL.  I wish you won't fear this feature... 
>
>
> BACKGROUND
> ========================================
>
> This branches off from [1] as mentioned therein.  Briefly speaking, a customer wants to shorten the time for nightly
loadingof data into their data warehouse as much as possible to be prepared for using the data warehouse for new
things.
>
> Currently, they are using Oracle's SQL*Loader with its no-logging feature.  They want a similar feature to migrate to
Postgres. Other than the data loading performance, they don't want to be concerned about the storage for large volumes
ofWAL. 
>
> In [1], we thought about something like Oracle's per-table no-logging feature, but it seems difficult (or at least
noteasy.)  Meanwhile, I found another feature added in the latest MySQL 8.0.21 [2].  This proposal follows it almost
directly. That satisfies the customer request. 
>
> As an aside, it's also conceivable that in the near future, users could see the WAL bottleneck (WAL buffer or disk)
whenthey utilize the parallel COPY that is being developed in the community. 
>
>
> FUNCTIONAL SPECIFICATION
> ========================================
>
> Add a new value 'none' to the server configuration parameter wal_level.  With this setting:
>
> * No WAL is emitted.
>
> * The server refuses to start (pg_ctl start fails) after an abnormal shutdown due to power outage, pg_ctl's immediate
shutdown,etc, showing a straightforward message like MySQL. 
>
> * Features like continuous archiving, pg_basebackup, and streaming/logical replication that requires wal_level >=
replicaare not available. 
>
> * The user can use all features again if you shut down the server successfully after data loading and reset wal_level
toa value other than none.  He needs to take a base backup or rebuild the replication standby after restarting the
server.
>
>
> In addition to the cosmetic modifications to the manual articles that refer to wal_level, add a clause or paragraphs
tothe following sections to let users know the availability of this feature. 
>
> 14.4. Populating a Database
> 18.6.1. Upgrading Data via pg_dumpall
>
>
> PROGRAM DESIGN (main point only)
> ========================================
>
> As in the bootstrap mode (during initdb), when wal_level = none, XLogInsert() does nothing and just returns a fixed
value,which is the tail of the last shutdown checkpoint WAL record.  As a result, the value is set to the relation page
header'sLSN field. 
>
> In addition, it might be worth having XLogBeginInsert() and XLogRec...() to check wal_level and just return.  I don't
expectmuch from this, but it may be interesting to give it a try to see the squeezed performance. 
>
> StartupXLOG() checks the wal_level setting in pg_control and quits the startup with ereport(FATAL) accordingly.
>
>
> [1]
> Implement UNLOGGED clause for COPY FROM
> https://www.postgresql.org/message-id/OSBPR01MB488887C0BDC5129C65DFC5E5ED640@OSBPR01MB4888.jpnprd01.prod.outlook.com
>
> [2]
> Disabling Redo Logging
> https://dev.mysql.com/doc/refman/8.0/en/innodb-redo-log.html#innodb-disable-redo-logging
>
>
> Regards
> Takayuki Tsunakawa
>
>
>


--
Best Wishes,
Ashutosh Bapat



RE: Disable WAL logging to speed up data loading

From
"osumi.takamichi@fujitsu.com"
Date:
Hi, Amul

> > We'd like to propose a feature to disable WAL to speed up data loading.  This
> was inspired by a feature added in the latest MySQL.  I wish you won't fear
> this feature...
> >
> 
> TWIMW, pg_bulkload contrib module[1], also does the same for the faster data
> loading.

Both features are helpful to make the data loading faster,
but those are different.

There are at least two major merits as their differences to use wal_level='none'.
The first one happens when user upgrades major version by pg_dumpall.
Imagine a case that one user gets a logical backup of whole cluster by pg_dumpall. 
The output file contains many COPY commands in it to recreate and upgrade the cluster.

Setting wal_level='none' can be easily used to boost the speed to remake the
cluster of a newer version by setting the wal_level.
OTOH, pg_bulkload can't treat this flow of upgrade in an easy way.
It requires to plan and write the detail of control files or commands for each table manually.
This wouldn't be easy for users.

Secondly,
to use pg_bulkload requires us to use a wrapper command of pg_ctl
like "pg_bulkload -r", which is prepared only for the feature.
This command is used when pg_bulkload is crashed.
The document recommends not to use pg_ctl directly in such a case.
Like this, paying attention to such usages or minor differences of usage is troublesome
while running a long operation of service, without support of the OSS community.

What did you think ?

Regards,
    Takamichi Osumi

RE: Disable WAL logging to speed up data loading

From
"osumi.takamichi@fujitsu.com"
Date:
Hello, Ashutosh

> Can they use a database with all unlogged tables?
Unfortunately, no. They want to switch a cluster condition to "without WAL logging"
only when they execute night bulk loading for their data warehouse.
In other words, they would like to keep their other usual operations with WAL.
In addition, using all tables as unlogged brings about
the risk to lose data warehouse's data caused by an unexpected server crash or power outage.

Regards,
    Takamichi Osumi

Re: Disable WAL logging to speed up data loading

From
Fujii Masao
Date:

On 2020/09/30 12:10, osumi.takamichi@fujitsu.com wrote:
> Hello, Ashutosh
> 
>> Can they use a database with all unlogged tables?
> Unfortunately, no. They want to switch a cluster condition to "without WAL logging"
> only when they execute night bulk loading for their data warehouse.
> In other words, they would like to keep their other usual operations with WAL.
> In addition, using all tables as unlogged brings about
> the risk to lose data warehouse's data caused by an unexpected server crash or power outage.

But the same issue can happen even in the proposed approach
because Tsunakawa-san explains as follows?

> The server refuses to start (pg_ctl start fails) after an abnormal shutdown due to power outage, pg_ctl's immediate
shutdown,etc, showing a straightforward message like MySQL.
 

Regards,

-- 
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION



Re: Disable WAL logging to speed up data loading

From
Kyotaro Horiguchi
Date:
At Thu, 1 Oct 2020 10:01:56 +0900, Fujii Masao <masao.fujii@oss.nttdata.com> wrote in 
> 
> 
> On 2020/09/30 12:10, osumi.takamichi@fujitsu.com wrote:
> > Hello, Ashutosh
> > 
> >> Can they use a database with all unlogged tables?
> > Unfortunately, no. They want to switch a cluster condition to "without
> > WAL logging"
> > only when they execute night bulk loading for their data warehouse.
> > In other words, they would like to keep their other usual operations
> > with WAL.
> > In addition, using all tables as unlogged brings about
> > the risk to lose data warehouse's data caused by an unexpected server
> > crash or power outage.
> 
> But the same issue can happen even in the proposed approach
> because Tsunakawa-san explains as follows?
>
> > The server refuses to start (pg_ctl start fails) after an abnormal
> > shutdown due to power outage, pg_ctl's immediate shutdown, etc,
> > showing a straightforward message like MySQL.

Exactly. Crash safety cannot be a rationale for this feature.

The proposed feature seems to give an advantage to a massive loading
onto a already very large table.  Currently turning a large logged
table into an unlogged one runs a table copy that takes a long
time. After the bulk-loading, to put back the table huge amount of WAL
is emitted and a table-copy takes a long time.  If we had the
wal_level=none feature, the preparation time is not needed and the
put-back time can be offloaded to online backing-up.  The crash-unsafe
period is inevitable but would be shorten by using storage snapshots.
On the other hand, a massive loading to an empty table is efficiently
performed by wal_level=minimal.

As I mentioned in another thread, if ALTER TABLE LOGGED/UNLOGGED
doesn't need a table copy (consequently no insertion WAL are emitted),
it would work. I'm not which is better, or easier to realize.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



RE: Disable WAL logging to speed up data loading

From
"osumi.takamichi@fujitsu.com"
Date:
Hello.

> >> Can they use a database with all unlogged tables?
> > Unfortunately, no. They want to switch a cluster condition to "without WAL
> logging"
> > only when they execute night bulk loading for their data warehouse.
> > In other words, they would like to keep their other usual operations with WAL.
> > In addition, using all tables as unlogged brings about the risk to
> > lose data warehouse's data caused by an unexpected server crash or power
> outage.
> 
> But the same issue can happen even in the proposed approach because
> Tsunakawa-san explains as follows?
Sorry, my last expression about "to lose data" was not precise.

When unlogged tables are used and the server is crashed unexpectedly,
all data after the last backup is truncated without WAL, right ?
In this case, sequential commands for unlogged tables disappear.
On the other hand, wal_level='none' can be changed to 'minimal' for example.
Therefore, such other sequential operations after the last backup
could be stored in WAL file, when a user applies such a change of wal_level.
I meant this is helpful but didn't this make sense ?

> > The server refuses to start (pg_ctl start fails) after an abnormal shutdown due
> to power outage, pg_ctl's immediate shutdown, etc, showing a straightforward
> message like MySQL.
This is the behavior during wal_level='none'.

> * The user can use all features again if you shut down the server successfully
> after data loading and reset wal_level to a value other than none.  He needs to
> take a base backup or rebuild the replication standby after restarting the server.
And, we expect we can change the wal_level even after using wal_level='none'
and can use other features again like I mentioned above. What did you think ?

Regards,
    Takamichi Osumi

Re: Disable WAL logging to speed up data loading

From
Fujii Masao
Date:

On 2020/10/01 12:04, osumi.takamichi@fujitsu.com wrote:
> Hello.
> 
>>>> Can they use a database with all unlogged tables?
>>> Unfortunately, no. They want to switch a cluster condition to "without WAL
>> logging"
>>> only when they execute night bulk loading for their data warehouse.
>>> In other words, they would like to keep their other usual operations with WAL.
>>> In addition, using all tables as unlogged brings about the risk to
>>> lose data warehouse's data caused by an unexpected server crash or power
>> outage.
>>
>> But the same issue can happen even in the proposed approach because
>> Tsunakawa-san explains as follows?
> Sorry, my last expression about "to lose data" was not precise.
> 
> When unlogged tables are used and the server is crashed unexpectedly,
> all data after the last backup is truncated without WAL, right ?
> In this case, sequential commands for unlogged tables disappear.

To avoid this issue we can mark the table as logged. You may be worried about
that ALTER TABLE SET LOGGED/UNLOGGED would take a long time because
the table needs to be rewriitten. One idea for that is to improve that command
so that it skips the table rewrite if wal_level=minimal.
Of course, also you can change wal_level after marking the table as unlogged.

Regards,

-- 
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION



Re: Disable WAL logging to speed up data loading

From
Kyotaro Horiguchi
Date:
At Thu, 1 Oct 2020 12:58:19 +0900, Fujii Masao <masao.fujii@oss.nttdata.com> wrote in 
> the table needs to be rewriitten. One idea for that is to improve that
> command
> so that it skips the table rewrite if wal_level=minimal.
> Of course, also you can change wal_level after marking the table as
> unlogged.

tablecmd.c:

> * There are two reasons for requiring a rewrite when changing
> * persistence: on one hand, we need to ensure that the buffers
> * belonging to each of the two relations are marked with or without
> * BM_PERMANENT properly.  On the other hand, since rewriting creates
> * and assigns a new relfilenode, we automatically create or drop an
> * init fork for the relation as appropriate.

According to this comment, perhaps we can do that at least for
wal_level=minimal.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



RE: Disable WAL logging to speed up data loading

From
"osumi.takamichi@fujitsu.com"
Date:
Hi, Horiguchi-San and Fujii-San.


Thank you so much both of you.
> > the table needs to be rewriitten. One idea for that is to improve that
> > command so that it skips the table rewrite if wal_level=minimal.
> > Of course, also you can change wal_level after marking the table as
> > unlogged.
>
> tablecmd.c:
The idea is really interesting.
I didn't come up with getting rid of the whole copy of
the ALTER TABLE UNLOGGED/LOGGED commands
only when wal_level='minimal'.

> > * There are two reasons for requiring a rewrite when changing
> > * persistence: on one hand, we need to ensure that the buffers
> > * belonging to each of the two relations are marked with or without
> > * BM_PERMANENT properly.  On the other hand, since rewriting creates
> > * and assigns a new relfilenode, we automatically create or drop an
> > * init fork for the relation as appropriate.
Thanks for sharing concrete comments in the source code.

> According to this comment, perhaps we can do that at least for
> wal_level=minimal.
When I compare the 2 ideas,
one of the benefits of this ALTER TABLE 's improvement
is that we can't avoid the downtime
while that of wal_level='none' provides an easy and faster
major version up via output file of pg_dumpall.
Both ideas have good points.
However, actually to modify ALTER TABLE's copy
looks far more difficult than wal_level='none' and
beyond my current ability.
So, I'd like to go forward with the direction of wal_level='none'.
Did you have strong objections for this direction ?


Regards,
    Takamichi Osumi



Re: Disable WAL logging to speed up data loading

From
Kyotaro Horiguchi
Date:
At Thu, 1 Oct 2020 08:14:42 +0000, "osumi.takamichi@fujitsu.com" <osumi.takamichi@fujitsu.com> wrote in 
> Hi, Horiguchi-San and Fujii-San.
> 
> 
> Thank you so much both of you.
> > > the table needs to be rewriitten. One idea for that is to improve that
> > > command so that it skips the table rewrite if wal_level=minimal.
> > > Of course, also you can change wal_level after marking the table as
> > > unlogged.
> > 
> > tablecmd.c:
> The idea is really interesting.
> I didn't come up with getting rid of the whole copy of
> the ALTER TABLE UNLOGGED/LOGGED commands
> only when wal_level='minimal'.
> 
> > > * There are two reasons for requiring a rewrite when changing
> > > * persistence: on one hand, we need to ensure that the buffers
> > > * belonging to each of the two relations are marked with or without
> > > * BM_PERMANENT properly.  On the other hand, since rewriting creates
> > > * and assigns a new relfilenode, we automatically create or drop an
> > > * init fork for the relation as appropriate.
> Thanks for sharing concrete comments in the source code.
> 
> > According to this comment, perhaps we can do that at least for
> > wal_level=minimal.
> When I compare the 2 ideas,
> one of the benefits of this ALTER TABLE 's improvement 
> is that we can't avoid the downtime
> while that of wal_level='none' provides an easy and faster
> major version up via output file of pg_dumpall.

The speedup has already been achieved with higher durability by
wal_level=minimal in that case.  Or maybe you should consider using
pg_upgrade instead.  Even inducing the time to take a backup copy of
the whole cluster, running pg_upgrade would be far faster than
pg_dumpall then loading.

> Both ideas have good points.
> However, actually to modify ALTER TABLE's copy
> looks far more difficult than wal_level='none' and
> beyond my current ability.
> So, I'd like to go forward with the direction of wal_level='none'.
> Did you have strong objections for this direction ?

For fuel(?) of the discussion, I tried a very-quick PoC for in-place
ALTER TABLE SET LOGGED/UNLOGGED and resulted as attached. After some
trials of several ways, I drifted to the following way after poking
several ways.

1. Flip BM_PERMANENT of active buffers
2. adding/removing init fork
3. sync files,
4. Flip pg_class.relpersistence.

It always skips table copy in the SET UNLOGGED case, and only when
wal_level=minimal in the SET LOGGED case.  Crash recovery seems
working by some brief testing by hand.

Of course, I haven't performed intensive test on it.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
diff --git a/src/backend/access/heap/heapam_handler.c b/src/backend/access/heap/heapam_handler.c
index dcaea7135f..d8a1c2a7e7 100644
--- a/src/backend/access/heap/heapam_handler.c
+++ b/src/backend/access/heap/heapam_handler.c
@@ -613,6 +613,26 @@ heapam_relation_set_new_filenode(Relation rel,
     smgrclose(srel);
 }
 
+static void
+heapam_relation_set_persistence(Relation rel, char persistence)
+{
+    Assert(rel->rd_rel->relpersistence == RELPERSISTENCE_PERMANENT ||
+           rel->rd_rel->relpersistence == RELPERSISTENCE_UNLOGGED);
+
+    Assert (rel->rd_rel->relpersistence != persistence);
+
+    if (persistence == RELPERSISTENCE_UNLOGGED)
+    {
+        Assert(rel->rd_rel->relkind == RELKIND_RELATION ||
+               rel->rd_rel->relkind == RELKIND_MATVIEW ||
+               rel->rd_rel->relkind == RELKIND_TOASTVALUE);
+        RelationCreateInitFork(rel->rd_node);
+    }
+    else
+        RelationDropInitFork(rel->rd_node);
+}
+
+
 static void
 heapam_relation_nontransactional_truncate(Relation rel)
 {
@@ -2540,6 +2560,7 @@ static const TableAmRoutine heapam_methods = {
     .compute_xid_horizon_for_tuples = heap_compute_xid_horizon_for_tuples,
 
     .relation_set_new_filenode = heapam_relation_set_new_filenode,
+    .relation_set_persistence = heapam_relation_set_persistence,
     .relation_nontransactional_truncate = heapam_relation_nontransactional_truncate,
     .relation_copy_data = heapam_relation_copy_data,
     .relation_copy_for_cluster = heapam_relation_copy_for_cluster,
diff --git a/src/backend/access/rmgrdesc/smgrdesc.c b/src/backend/access/rmgrdesc/smgrdesc.c
index a7c0cb1bc3..8397002613 100644
--- a/src/backend/access/rmgrdesc/smgrdesc.c
+++ b/src/backend/access/rmgrdesc/smgrdesc.c
@@ -40,6 +40,14 @@ smgr_desc(StringInfo buf, XLogReaderState *record)
                          xlrec->blkno, xlrec->flags);
         pfree(path);
     }
+    else if (info == XLOG_SMGR_UNLINK)
+    {
+        xl_smgr_unlink *xlrec = (xl_smgr_unlink *) rec;
+        char       *path = relpathperm(xlrec->rnode, xlrec->forkNum);
+
+        appendStringInfoString(buf, path);
+        pfree(path);
+    }
 }
 
 const char *
@@ -55,6 +63,9 @@ smgr_identify(uint8 info)
         case XLOG_SMGR_TRUNCATE:
             id = "TRUNCATE";
             break;
+        case XLOG_SMGR_UNLINK:
+            id = "UNLINK";
+            break;
     }
 
     return id;
diff --git a/src/backend/catalog/storage.c b/src/backend/catalog/storage.c
index dbbd3aa31f..7f695e6d8b 100644
--- a/src/backend/catalog/storage.c
+++ b/src/backend/catalog/storage.c
@@ -60,6 +60,8 @@ int            wal_skip_threshold = 2048;    /* in kilobytes */
 typedef struct PendingRelDelete
 {
     RelFileNode relnode;        /* relation that may need to be deleted */
+    bool        deleteinitfork;    /* delete only init fork if true */
+    bool        createinitfork;    /* create init fork if true */
     BackendId    backend;        /* InvalidBackendId if not a temp rel */
     bool        atCommit;        /* T=delete at commit; F=delete at abort */
     int            nestLevel;        /* xact nesting level of request */
@@ -153,6 +155,8 @@ RelationCreateStorage(RelFileNode rnode, char relpersistence)
     pending = (PendingRelDelete *)
         MemoryContextAlloc(TopMemoryContext, sizeof(PendingRelDelete));
     pending->relnode = rnode;
+    pending->deleteinitfork = false;
+    pending->createinitfork = false;
     pending->backend = backend;
     pending->atCommit = false;    /* delete if abort */
     pending->nestLevel = GetCurrentTransactionNestLevel();
@@ -168,6 +172,53 @@ RelationCreateStorage(RelFileNode rnode, char relpersistence)
     return srel;
 }
 
+void
+RelationCreateInitFork(RelFileNode rnode)
+{
+    PendingRelDelete *pending;
+    SMgrRelation srel;
+
+    srel = smgropen(rnode, InvalidBackendId);
+    smgrcreate(srel, INIT_FORKNUM, false);
+    log_smgrcreate(&rnode, INIT_FORKNUM);
+    smgrimmedsync(srel, INIT_FORKNUM);
+
+    /* Add the relation to the list of stuff to delete at abort */
+    pending = (PendingRelDelete *)
+        MemoryContextAlloc(TopMemoryContext, sizeof(PendingRelDelete));
+    pending->relnode = rnode;
+    pending->deleteinitfork = true;
+    pending->createinitfork = false;
+    pending->backend = InvalidBackendId;
+    pending->atCommit = false;    /* delete if abort */
+    pending->nestLevel = GetCurrentTransactionNestLevel();
+    pending->next = pendingDeletes;
+    pendingDeletes = pending;
+}
+
+void
+RelationDropInitFork(RelFileNode rnode)
+{
+    PendingRelDelete *pending;
+    SMgrRelation srel;
+
+    srel = smgropen(rnode, InvalidBackendId);
+    smgrunlink(srel, INIT_FORKNUM, false);
+    log_smgrunlink(&rnode, INIT_FORKNUM);
+
+    /* Add the relation to the list of stuff to delete at abort */
+    pending = (PendingRelDelete *)
+        MemoryContextAlloc(TopMemoryContext, sizeof(PendingRelDelete));
+    pending->relnode = rnode;
+    pending->deleteinitfork = false;
+    pending->createinitfork = true;
+    pending->backend = InvalidBackendId;
+    pending->atCommit = false;    /* create if abort */
+    pending->nestLevel = GetCurrentTransactionNestLevel();
+    pending->next = pendingDeletes;
+    pendingDeletes = pending;
+}
+
 /*
  * Perform XLogInsert of an XLOG_SMGR_CREATE record to WAL.
  */
@@ -187,6 +238,25 @@ log_smgrcreate(const RelFileNode *rnode, ForkNumber forkNum)
     XLogInsert(RM_SMGR_ID, XLOG_SMGR_CREATE | XLR_SPECIAL_REL_UPDATE);
 }
 
+/*
+ * Perform XLogInsert of an XLOG_SMGR_UNLINK record to WAL.
+ */
+void
+log_smgrunlink(const RelFileNode *rnode, ForkNumber forkNum)
+{
+    xl_smgr_unlink xlrec;
+
+    /*
+     * Make an XLOG entry reporting the file unlink.
+     */
+    xlrec.rnode = *rnode;
+    xlrec.forkNum = forkNum;
+
+    XLogBeginInsert();
+    XLogRegisterData((char *) &xlrec, sizeof(xlrec));
+    XLogInsert(RM_SMGR_ID, XLOG_SMGR_UNLINK | XLR_SPECIAL_REL_UPDATE);
+}
+
 /*
  * RelationDropStorage
  *        Schedule unlinking of physical storage at transaction commit.
@@ -200,6 +270,8 @@ RelationDropStorage(Relation rel)
     pending = (PendingRelDelete *)
         MemoryContextAlloc(TopMemoryContext, sizeof(PendingRelDelete));
     pending->relnode = rel->rd_node;
+    pending->createinitfork = false;
+    pending->deleteinitfork = false;
     pending->backend = rel->rd_backend;
     pending->atCommit = true;    /* delete if commit */
     pending->nestLevel = GetCurrentTransactionNestLevel();
@@ -625,19 +697,34 @@ smgrDoPendingDeletes(bool isCommit)
 
                 srel = smgropen(pending->relnode, pending->backend);
 
-                /* allocate the initial array, or extend it, if needed */
-                if (maxrels == 0)
+                /* XXXX */
+                if (pending->createinitfork)
                 {
-                    maxrels = 8;
-                    srels = palloc(sizeof(SMgrRelation) * maxrels);
+                    smgrcreate(srel, INIT_FORKNUM, false);
+                    log_smgrcreate(&pending->relnode, INIT_FORKNUM);
+                    smgrimmedsync(srel, INIT_FORKNUM);
                 }
-                else if (maxrels <= nrels)
+                else if (pending->deleteinitfork)
                 {
-                    maxrels *= 2;
-                    srels = repalloc(srels, sizeof(SMgrRelation) * maxrels);
+                    smgrunlink(srel, INIT_FORKNUM, false);
+                    log_smgrunlink(&pending->relnode, INIT_FORKNUM);
                 }
+                else
+                {
+                    /* allocate the initial array, or extend it, if needed */
+                    if (maxrels == 0)
+                    {
+                        maxrels = 8;
+                        srels = palloc(sizeof(SMgrRelation) * maxrels);
+                    }
+                    else if (maxrels <= nrels)
+                    {
+                        maxrels *= 2;
+                        srels = repalloc(srels, sizeof(SMgrRelation) * maxrels);
+                    }
 
-                srels[nrels++] = srel;
+                    srels[nrels++] = srel;
+                }
             }
             /* must explicitly free the list entry */
             pfree(pending);
@@ -916,6 +1003,14 @@ smgr_redo(XLogReaderState *record)
         reln = smgropen(xlrec->rnode, InvalidBackendId);
         smgrcreate(reln, xlrec->forkNum, true);
     }
+    else if (info == XLOG_SMGR_UNLINK)
+    {
+        xl_smgr_unlink *xlrec = (xl_smgr_unlink *) XLogRecGetData(record);
+        SMgrRelation reln;
+
+        reln = smgropen(xlrec->rnode, InvalidBackendId);
+        smgrunlink(reln, xlrec->forkNum, true);
+    }
     else if (info == XLOG_SMGR_TRUNCATE)
     {
         xl_smgr_truncate *xlrec = (xl_smgr_truncate *) XLogRecGetData(record);
diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index e0ac4e05e5..fdcdb1dbb1 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -4897,6 +4897,89 @@ ATParseTransformCmd(List **wqueue, AlteredTableInfo *tab, Relation rel,
     return newcmd;
 }
 
+static bool
+try_inplace_persistence_change(AlteredTableInfo *tab, char persistence,
+                               LOCKMODE lockmode)
+{
+    Relation     rel;
+    Relation    classRel;
+    HeapTuple    tuple,
+                newtuple;
+    Datum        new_val[Natts_pg_class];
+    bool        new_null[Natts_pg_class],
+                new_repl[Natts_pg_class];
+    int            i;
+
+    Assert(tab->rewrite == AT_REWRITE_ALTER_PERSISTENCE);
+    Assert(lockmode == AccessExclusiveLock);
+
+    /*
+     * Under the following condition, we need to call ATRewriteTable, but
+     * cannot be false in the AT_REWRITE_ALTER_PERSISTENCE case.
+     */
+    Assert(tab->constraints == NULL && tab->partition_constraint == NULL &&
+           tab->newvals == NULL && !tab->verify_new_notnull);
+
+    /*
+     * In the case of SET ULOGGED, the resulting relation does not need WALs,
+     * so the storage can be used as-is. In the opposite case, the table is
+     * required to be restored from WAL, so rewrite the relation.  However,
+     * when wal_level = minimal, we can omit WALs by immediately syncing the
+     * storage.
+     */
+    if (tab->newrelpersistence == RELPERSISTENCE_PERMANENT || XLogIsNeeded())
+        return false;
+
+    rel = table_open(tab->relid, lockmode);
+
+    Assert(rel->rd_rel->relpersistence != persistence);
+
+    elog(DEBUG1, "perform im-place persistnce change");
+
+    RelationOpenSmgr(rel);
+    SetRelFileNodeBuffersPersistence(rel->rd_smgr->smgr_rnode,
+                                     persistence == RELPERSISTENCE_PERMANENT);
+    table_relation_set_persistence(rel, persistence);
+
+    /*
+     * This relation is now WAL-logged. Sync all files immediately to establish
+     * the initial state.
+     */
+    if (persistence == RELPERSISTENCE_PERMANENT)
+    {
+        for (i = 0 ; i < MAX_FORKNUM ; i++)
+        {
+            if (smgrexists(rel->rd_smgr, i))
+                smgrimmedsync(rel->rd_smgr, i);
+        }
+    }
+
+    table_close(rel, lockmode);
+    classRel = table_open(RelationRelationId, RowExclusiveLock);
+    tuple = SearchSysCacheCopy1(RELOID,
+                                ObjectIdGetDatum(RelationGetRelid(rel)));
+    if (!HeapTupleIsValid(tuple))
+        elog(ERROR, "cache lookup failed for relation %u",
+             RelationGetRelid(rel));
+
+    memset(new_val, 0, sizeof(new_val));
+    memset(new_null, false, sizeof(new_null));
+    memset(new_repl, false, sizeof(new_repl));
+
+    new_val[Anum_pg_class_relpersistence - 1] = CharGetDatum(persistence);
+    new_null[Anum_pg_class_relpersistence - 1] = false;
+    new_repl[Anum_pg_class_relpersistence - 1] = true;
+
+    newtuple = heap_modify_tuple(tuple, RelationGetDescr(classRel),
+                                 new_val, new_null, new_repl);
+
+    CatalogTupleUpdate(classRel, &newtuple->t_self, newtuple);
+    heap_freetuple(newtuple);
+    table_close(classRel, RowExclusiveLock);
+
+    return true;
+}
+
 /*
  * ATRewriteTables: ALTER TABLE phase 3
  */
@@ -5017,45 +5100,51 @@ ATRewriteTables(AlterTableStmt *parsetree, List **wqueue, LOCKMODE lockmode,
                                          tab->relid,
                                          tab->rewrite);
 
-            /*
-             * Create transient table that will receive the modified data.
-             *
-             * Ensure it is marked correctly as logged or unlogged.  We have
-             * to do this here so that buffers for the new relfilenode will
-             * have the right persistence set, and at the same time ensure
-             * that the original filenode's buffers will get read in with the
-             * correct setting (i.e. the original one).  Otherwise a rollback
-             * after the rewrite would possibly result with buffers for the
-             * original filenode having the wrong persistence setting.
-             *
-             * NB: This relies on swap_relation_files() also swapping the
-             * persistence. That wouldn't work for pg_class, but that can't be
-             * unlogged anyway.
-             */
-            OIDNewHeap = make_new_heap(tab->relid, NewTableSpace, persistence,
-                                       lockmode);
+            if (tab->rewrite != AT_REWRITE_ALTER_PERSISTENCE ||
+                !try_inplace_persistence_change(tab, persistence, lockmode))
+            {
+                /*
+                 * Create transient table that will receive the modified data.
+                 *
+                 * Ensure it is marked correctly as logged or unlogged.  We
+                 * have to do this here so that buffers for the new relfilenode
+                 * will have the right persistence set, and at the same time
+                 * ensure that the original filenode's buffers will get read in
+                 * with the correct setting (i.e. the original one).  Otherwise
+                 * a rollback after the rewrite would possibly result with
+                 * buffers for the original filenode having the wrong
+                 * persistence setting.
+                 *
+                 * NB: This relies on swap_relation_files() also swapping the
+                 * persistence. That wouldn't work for pg_class, but that can't
+                 * be unlogged anyway.
+                 */
+                OIDNewHeap = make_new_heap(tab->relid, NewTableSpace, persistence,
+                                           lockmode);
 
-            /*
-             * Copy the heap data into the new table with the desired
-             * modifications, and test the current data within the table
-             * against new constraints generated by ALTER TABLE commands.
-             */
-            ATRewriteTable(tab, OIDNewHeap, lockmode);
+                /*
+                 * Copy the heap data into the new table with the desired
+                 * modifications, and test the current data within the table
+                 * against new constraints generated by ALTER TABLE commands.
+                 */
+                ATRewriteTable(tab, OIDNewHeap, lockmode);
 
-            /*
-             * Swap the physical files of the old and new heaps, then rebuild
-             * indexes and discard the old heap.  We can use RecentXmin for
-             * the table's new relfrozenxid because we rewrote all the tuples
-             * in ATRewriteTable, so no older Xid remains in the table.  Also,
-             * we never try to swap toast tables by content, since we have no
-             * interest in letting this code work on system catalogs.
-             */
-            finish_heap_swap(tab->relid, OIDNewHeap,
-                             false, false, true,
-                             !OidIsValid(tab->newTableSpace),
-                             RecentXmin,
-                             ReadNextMultiXactId(),
-                             persistence);
+                /*
+                 * Swap the physical files of the old and new heaps, then
+                 * rebuild indexes and discard the old heap.  We can use
+                 * RecentXmin for the table's new relfrozenxid because we
+                 * rewrote all the tuples in ATRewriteTable, so no older Xid
+                 * remains in the table.  Also, we never try to swap toast
+                 * tables by content, since we have no interest in letting this
+                 * code work on system catalogs.
+                 */
+                finish_heap_swap(tab->relid, OIDNewHeap,
+                                 false, false, true,
+                                 !OidIsValid(tab->newTableSpace),
+                                 RecentXmin,
+                                 ReadNextMultiXactId(),
+                                 persistence);
+            }
         }
         else
         {
diff --git a/src/backend/storage/buffer/bufmgr.c b/src/backend/storage/buffer/bufmgr.c
index e549fa1d30..7fd6cae12f 100644
--- a/src/backend/storage/buffer/bufmgr.c
+++ b/src/backend/storage/buffer/bufmgr.c
@@ -3031,6 +3031,40 @@ DropRelFileNodeBuffers(RelFileNodeBackend rnode, ForkNumber *forkNum,
     }
 }
 
+void
+SetRelFileNodeBuffersPersistence(RelFileNodeBackend rnode, bool permanent)
+{
+    int            i;
+
+    Assert (!RelFileNodeBackendIsTemp(rnode));
+
+    for (i = 0; i < NBuffers; i++)
+    {
+        BufferDesc *bufHdr = GetBufferDescriptor(i);
+        uint32        buf_state;
+
+        if (!RelFileNodeEquals(bufHdr->tag.rnode, rnode.node))
+            continue;
+
+        buf_state = LockBufHdr(bufHdr);
+
+        if (RelFileNodeEquals(bufHdr->tag.rnode, rnode.node))
+        {
+            if (permanent)
+            {
+                Assert ((buf_state &  BM_PERMANENT) == 0);
+                buf_state |= BM_PERMANENT;
+            }
+            else
+            {
+                Assert ((buf_state &  BM_PERMANENT) != 0);
+                buf_state &= ~BM_PERMANENT;
+            }
+        }
+        UnlockBufHdr(bufHdr, buf_state);
+    }
+}
+
 /* ---------------------------------------------------------------------
  *        DropRelFileNodesAllBuffers
  *
diff --git a/src/backend/storage/smgr/smgr.c b/src/backend/storage/smgr/smgr.c
index dcc09df0c7..5eb9e97b3d 100644
--- a/src/backend/storage/smgr/smgr.c
+++ b/src/backend/storage/smgr/smgr.c
@@ -645,6 +645,12 @@ smgrimmedsync(SMgrRelation reln, ForkNumber forknum)
     smgrsw[reln->smgr_which].smgr_immedsync(reln, forknum);
 }
 
+void
+smgrunlink(SMgrRelation reln, ForkNumber forknum, bool isRedo)
+{
+    smgrsw[reln->smgr_which].smgr_unlink(reln->smgr_rnode, forknum, isRedo);
+}
+
 /*
  * AtEOXact_SMgr
  *
diff --git a/src/include/access/tableam.h b/src/include/access/tableam.h
index 387eb34a61..245a8f0fdd 100644
--- a/src/include/access/tableam.h
+++ b/src/include/access/tableam.h
@@ -451,6 +451,8 @@ typedef struct TableAmRoutine
                                               TransactionId *freezeXid,
                                               MultiXactId *minmulti);
 
+    void        (*relation_set_persistence) (Relation rel, char persistence);
+
     /*
      * This callback needs to remove all contents from `rel`'s current
      * relfilenode. No provisions for transactional behaviour need to be made.
@@ -1404,6 +1406,12 @@ table_relation_set_new_filenode(Relation rel,
                                                freezeXid, minmulti);
 }
 
+static inline void
+table_relation_set_persistence(Relation rel, char persistence)
+{
+    rel->rd_tableam->relation_set_persistence(rel, persistence);
+}
+
 /*
  * Remove all table contents from `rel`, in a non-transactional manner.
  * Non-transactional meaning that there's no need to support rollbacks. This
diff --git a/src/include/catalog/storage.h b/src/include/catalog/storage.h
index 30c38e0ca6..f4fd61f639 100644
--- a/src/include/catalog/storage.h
+++ b/src/include/catalog/storage.h
@@ -23,6 +23,8 @@
 extern int    wal_skip_threshold;
 
 extern SMgrRelation RelationCreateStorage(RelFileNode rnode, char relpersistence);
+extern void RelationCreateInitFork(RelFileNode rel);
+extern void RelationDropInitFork(RelFileNode rel);
 extern void RelationDropStorage(Relation rel);
 extern void RelationPreserveStorage(RelFileNode rnode, bool atCommit);
 extern void RelationPreTruncate(Relation rel);
diff --git a/src/include/catalog/storage_xlog.h b/src/include/catalog/storage_xlog.h
index 7b21cab2e0..73ad2ae89e 100644
--- a/src/include/catalog/storage_xlog.h
+++ b/src/include/catalog/storage_xlog.h
@@ -29,6 +29,7 @@
 /* XLOG gives us high 4 bits */
 #define XLOG_SMGR_CREATE    0x10
 #define XLOG_SMGR_TRUNCATE    0x20
+#define XLOG_SMGR_UNLINK    0x30
 
 typedef struct xl_smgr_create
 {
@@ -36,6 +37,12 @@ typedef struct xl_smgr_create
     ForkNumber    forkNum;
 } xl_smgr_create;
 
+typedef struct xl_smgr_unlink
+{
+    RelFileNode rnode;
+    ForkNumber    forkNum;
+} xl_smgr_unlink;
+
 /* flags for xl_smgr_truncate */
 #define SMGR_TRUNCATE_HEAP        0x0001
 #define SMGR_TRUNCATE_VM        0x0002
@@ -51,6 +58,7 @@ typedef struct xl_smgr_truncate
 } xl_smgr_truncate;
 
 extern void log_smgrcreate(const RelFileNode *rnode, ForkNumber forkNum);
+extern void log_smgrunlink(const RelFileNode *rnode, ForkNumber forkNum);
 
 extern void smgr_redo(XLogReaderState *record);
 extern void smgr_desc(StringInfo buf, XLogReaderState *record);
diff --git a/src/include/storage/bufmgr.h b/src/include/storage/bufmgr.h
index ee91b8fa26..98967eeff9 100644
--- a/src/include/storage/bufmgr.h
+++ b/src/include/storage/bufmgr.h
@@ -205,6 +205,8 @@ extern void FlushRelationsAllBuffers(struct SMgrRelationData **smgrs, int nrels)
 extern void FlushDatabaseBuffers(Oid dbid);
 extern void DropRelFileNodeBuffers(RelFileNodeBackend rnode, ForkNumber *forkNum,
                                    int nforks, BlockNumber *firstDelBlock);
+extern void SetRelFileNodeBuffersPersistence(RelFileNodeBackend rnode,
+                                             bool permanent);
 extern void DropRelFileNodesAllBuffers(RelFileNodeBackend *rnodes, int nnodes);
 extern void DropDatabaseBuffers(Oid dbid);
 
diff --git a/src/include/storage/smgr.h b/src/include/storage/smgr.h
index f28a842401..5d74631006 100644
--- a/src/include/storage/smgr.h
+++ b/src/include/storage/smgr.h
@@ -86,6 +86,7 @@ extern void smgrclose(SMgrRelation reln);
 extern void smgrcloseall(void);
 extern void smgrclosenode(RelFileNodeBackend rnode);
 extern void smgrcreate(SMgrRelation reln, ForkNumber forknum, bool isRedo);
+extern void smgrunlink(SMgrRelation reln, ForkNumber forknum, bool isRedo);
 extern void smgrdosyncall(SMgrRelation *rels, int nrels);
 extern void smgrdounlinkall(SMgrRelation *rels, int nrels, bool isRedo);
 extern void smgrextend(SMgrRelation reln, ForkNumber forknum,

Re: Disable WAL logging to speed up data loading

From
Fujii Masao
Date:

On 2020/10/02 10:06, Kyotaro Horiguchi wrote:
> At Thu, 1 Oct 2020 08:14:42 +0000, "osumi.takamichi@fujitsu.com" <osumi.takamichi@fujitsu.com> wrote in
>> Hi, Horiguchi-San and Fujii-San.
>>
>>
>> Thank you so much both of you.
>>>> the table needs to be rewriitten. One idea for that is to improve that
>>>> command so that it skips the table rewrite if wal_level=minimal.
>>>> Of course, also you can change wal_level after marking the table as
>>>> unlogged.
>>>
>>> tablecmd.c:
>> The idea is really interesting.
>> I didn't come up with getting rid of the whole copy of
>> the ALTER TABLE UNLOGGED/LOGGED commands
>> only when wal_level='minimal'.
>>
>>>> * There are two reasons for requiring a rewrite when changing
>>>> * persistence: on one hand, we need to ensure that the buffers
>>>> * belonging to each of the two relations are marked with or without
>>>> * BM_PERMANENT properly.  On the other hand, since rewriting creates
>>>> * and assigns a new relfilenode, we automatically create or drop an
>>>> * init fork for the relation as appropriate.
>> Thanks for sharing concrete comments in the source code.
>>
>>> According to this comment, perhaps we can do that at least for
>>> wal_level=minimal.
>> When I compare the 2 ideas,
>> one of the benefits of this ALTER TABLE 's improvement
>> is that we can't avoid the downtime
>> while that of wal_level='none' provides an easy and faster
>> major version up via output file of pg_dumpall.
> 
> The speedup has already been achieved with higher durability by
> wal_level=minimal in that case.

I was thinking the same, i.e., wal_level=minimal + wal_skip_threshold would
speed up that initial data loading.


>  Or maybe you should consider using
> pg_upgrade instead.  Even inducing the time to take a backup copy of
> the whole cluster, running pg_upgrade would be far faster than
> pg_dumpall then loading.
> 
>> Both ideas have good points.
>> However, actually to modify ALTER TABLE's copy
>> looks far more difficult than wal_level='none' and
>> beyond my current ability.
>> So, I'd like to go forward with the direction of wal_level='none'.
>> Did you have strong objections for this direction ?

No, I have no strong objection against your trial. But I was thinking
that it's not so easy to design and implement wal_level=none.
For example, there are some functions and commands depending on
the existence of WAL, like pg_switch_wal(), PREPARE TRANSACTION
and COMMIT PREPARED. Probably you need to define how they should
work in wal_level=none, e.g., emit an error.


> For fuel(?) of the discussion, I tried a very-quick PoC for in-place
> ALTER TABLE SET LOGGED/UNLOGGED and resulted as attached. After some
> trials of several ways, I drifted to the following way after poking
> several ways.

Nice!


> 1. Flip BM_PERMANENT of active buffers
> 2. adding/removing init fork
> 3. sync files,
> 4. Flip pg_class.relpersistence.
> 
> It always skips table copy in the SET UNLOGGED case,

Even in wal_level != minimal?
What happens in the standby side when SET UNLOGGED is executed without
the table rewrite in the primary? The table data should be truncated
in the standby?

Regards,

-- 
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION



Re: Disable WAL logging to speed up data loading

From
Kyotaro Horiguchi
Date:
At Fri, 2 Oct 2020 10:56:21 +0900, Fujii Masao <masao.fujii@oss.nttdata.com> wrote in 
> 
> On 2020/10/02 10:06, Kyotaro Horiguchi wrote:
> > At Thu, 1 Oct 2020 08:14:42 +0000, "osumi.takamichi@fujitsu.com"
> > <osumi.takamichi@fujitsu.com> wrote in
> >> When I compare the 2 ideas,
> >> one of the benefits of this ALTER TABLE 's improvement
> >> is that we can't avoid the downtime
> >> while that of wal_level='none' provides an easy and faster
> >> major version up via output file of pg_dumpall.
> > The speedup has already been achieved with higher durability by
> > wal_level=minimal in that case.
> 
> I was thinking the same, i.e., wal_level=minimal + wal_skip_threshold
> would
> speed up that initial data loading.

Yeah.

> >  Or maybe you should consider using
> > pg_upgrade instead.  Even inducing the time to take a backup copy of
> > the whole cluster, running pg_upgrade would be far faster than
> > pg_dumpall then loading.
> > 
> >> Both ideas have good points.
> >> However, actually to modify ALTER TABLE's copy
> >> looks far more difficult than wal_level='none' and
> >> beyond my current ability.
> >> So, I'd like to go forward with the direction of wal_level='none'.
> >> Did you have strong objections for this direction ?
> 
> No, I have no strong objection against your trial. But I was thinking
> that it's not so easy to design and implement wal_level=none.
> For example, there are some functions and commands depending on
> the existence of WAL, like pg_switch_wal(), PREPARE TRANSACTION
> and COMMIT PREPARED. Probably you need to define how they should
> work in wal_level=none, e.g., emit an error.
> 
> 
> > For fuel(?) of the discussion, I tried a very-quick PoC for in-place
> > ALTER TABLE SET LOGGED/UNLOGGED and resulted as attached. After some
> > trials of several ways, I drifted to the following way after poking
> > several ways.
> 
> Nice!
> 
> 
> > 1. Flip BM_PERMANENT of active buffers
> > 2. adding/removing init fork
> > 3. sync files,
> > 4. Flip pg_class.relpersistence.
> > It always skips table copy in the SET UNLOGGED case,
> 
> Even in wal_level != minimal?
> What happens in the standby side when SET UNLOGGED is executed without
> the table rewrite in the primary? The table data should be truncated
> in the standby?

A table turned into unlogged on the primary is also turned into
unlogged on the standby and it is inaccessible on the standby. Maybe
the storage is dropped on both patched and unpatched versoins.

After the table is again turned into logged, the content is
transferred via WAL records generated from the insertions into the new
storage and it rebuilds the same storage on the standby on both
patched and unpatched.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



Re: Disable WAL logging to speed up data loading

From
Kyotaro Horiguchi
Date:
Sorry for the slippery brain...

At Fri, 02 Oct 2020 13:38:22 +0900 (JST), Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote in 
> At Fri, 2 Oct 2020 10:56:21 +0900, Fujii Masao <masao.fujii@oss.nttdata.com> wrote in 
> > 
> > On 2020/10/02 10:06, Kyotaro Horiguchi wrote:
> > > 1. Flip BM_PERMANENT of active buffers
> > > 2. adding/removing init fork
> > > 3. sync files,
> > > 4. Flip pg_class.relpersistence.
> > > It always skips table copy in the SET UNLOGGED case,
> > 
> > Even in wal_level != minimal?
> > What happens in the standby side when SET UNLOGGED is executed without
> > the table rewrite in the primary? The table data should be truncated
> > in the standby?
> 
> A table turned into unlogged on the primary is also turned into
> unlogged on the standby and it is inaccessible on the standby.

> Maybe the storage is dropped on both patched and unpatched versoins.

Maybe the storage dropped on unpatched and left alone on patched.

> After the table is again turned into logged, the content is
> transferred via WAL records generated from the insertions into the new
> storage and it rebuilds the same storage on the standby on both
> patched and unpatched.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



RE: Disable WAL logging to speed up data loading

From
"tsunakawa.takay@fujitsu.com"
Date:
From: Fujii Masao <masao.fujii@oss.nttdata.com>
> > The speedup has already been achieved with higher durability by
> > wal_level=minimal in that case.
>
> I was thinking the same, i.e., wal_level=minimal + wal_skip_threshold would
> speed up that initial data loading.

First of all, thank you Horiguchi-san for trying to improve ALTER TABLE SET UNLOGGED/LOGGED.  That should also be
appealing.

At the same time, as I said before, both features have good points.  TBH, as a user, I'm kind of attracted by MySQL's
approachbecause of its simplicity for users (although DBMS developers may be worried about this and that.)  What tempts
meis that I can just switch on the feature with a single configuration parameter, and continue to use existing SQL
scriptsand other data integration software without knowing what tables those load data into.  In the same context, I
don'thave to add or delete ALTER TABLE statements when I have to change the set of tables to be loaded.  For the same
reason,I'm also interested in Oracle's another feature ALTER TABLESPACE LOGGING/NOLOGGING. 

BTW, does ALTER TABLE LOGGED/UNLOGGED on a partitioned table get the change to its all partitions?  It would be a bit
tediousto add/delete ALTER TABLE LOGGED/UNLOGGED when I add/drop a partition. 

Regarding data migration, data movement is not limited only to major upgrades.  It will be convenient to speed up the
migrationof the entire database cluster into a new instance for testing and new deployment.  (I'm not sure about recent
pg_upgrade,but pg_upgrade sometimes cannot upgrade too older versions.) 

To conclude, I hope both features will be realized, and wish we won't fall in a situation where the words fly such as
"Mineis enough. Yours is risky and not necessary." 

With that said, I think we may as well separate the thread some time later for CF entry.  Otherwise, we will have
troublein finding the latest patch from the CF entry. 


> No, I have no strong objection against your trial. But I was thinking
> that it's not so easy to design and implement wal_level=none.
> For example, there are some functions and commands depending on
> the existence of WAL, like pg_switch_wal(), PREPARE TRANSACTION
> and COMMIT PREPARED. Probably you need to define how they should
> work in wal_level=none, e.g., emit an error.

    Yeah, we thought pg_switch_wal() may need some treatment.  We'll check PREPARE and COMMIT PREPARED as well.  I'd
appreciateit if you share what you notice at any time.  It is possible that we should emit WAL records of some resource
managers,like the bootstrap mode emits WAL only for RM_XLOG_ID. 


Regards
Takayuki Tsunakawa






Re: Disable WAL logging to speed up data loading

From
Kyotaro Horiguchi
Date:
At Fri, 02 Oct 2020 13:51:35 +0900 (JST), Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote in 
> Sorry for the slippery brain...  ^2

> At Fri, 02 Oct 2020 13:38:22 +0900 (JST), Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote in 
> > At Fri, 2 Oct 2020 10:56:21 +0900, Fujii Masao <masao.fujii@oss.nttdata.com> wrote in 
> > > 
> > > On 2020/10/02 10:06, Kyotaro Horiguchi wrote:
> > > > 1. Flip BM_PERMANENT of active buffers
> > > > 2. adding/removing init fork
> > > > 3. sync files,
> > > > 4. Flip pg_class.relpersistence.
> > > > It always skips table copy in the SET UNLOGGED case,
> > > 
> > > Even in wal_level != minimal?
> > > What happens in the standby side when SET UNLOGGED is executed without
> > > the table rewrite in the primary? The table data should be truncated
> > > in the standby?
> > 
> > A table turned into unlogged on the primary is also turned into
> > unlogged on the standby and it is inaccessible on the standby.

> Maybe the storage dropped on unpatched

Maybe the old storage is replaced with an empty stroage on unpatched.

> and left alone on patched.

> > After the table is again turned into logged, the content is
> > transferred via WAL records generated from the insertions into the new
> > storage and it rebuilds the same storage on the standby on both
> > patched and unpatched.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



RE: Disable WAL logging to speed up data loading

From
"osumi.takamichi@fujitsu.com"
Date:
Hi,


I wrote and attached the first patch to disable WAL logging.
This patch passes the regression test of check-world already
and is formatted by pgindent.

Also, I conducted 3 types of performance tests to
clarify the rough benefits of the patch.

I compared two wal_levels both 'minimal' and 'none'.
For both levels, I measured
(1) cluster's restore from pg_dumpall output,
(2) COPY initial data to a defined table as initial data loading, and
(3) COPY data to a defined table with tuples, 3 times each for all cases.
After that, calculated the average and the ratio of the loading speed.
The conclusion is that wal_level=none cuts about 20% of the loading speed
compared to 'minimal' in the three cases above. For sure, we could tune the configuration of
postgresql.conf further but roughly it's meaningful to share the result, I think.
'shared_buffers' was set to 40% of RAM while 'maintenance_work_mem'
was set to 20%. I set max_wal_senders = 0 this time.

The input data was generated from pgbench with 1000 scale factor.
It's about 9.3GB. For the table definition or
the initial data for appended data loading test case,
I used pgbench to set up the schema as well.
Sharing other scenario to measure is welcome.

I need to say that the current patch doesn't take the change of wal_level='none'
from/to other ones into account fully or other commands like ones for two phase commit yet.
Sorry for that.

Also, to return the value of last shut down LSN in XLogInsert(),
it acquires lock of control file every time, which was not good clearly.
I tried to replace that code like having a LSN cache as one variable
but I was not sure where I should put the function to set the initial value of the LSN.
After LocalProcessControlFile() in PostmasterMain() was not the right place.
I'd be happy if someone gives me an advice about it.


Best,
    Takamichi Osumi

Attachment

Re: Disable WAL logging to speed up data loading

From
Laurenz Albe
Date:
On Wed, 2020-10-28 at 04:11 +0000, osumi.takamichi@fujitsu.com wrote:
> I wrote and attached the first patch to disable WAL logging.
> This patch passes the regression test of check-world already
> and is formatted by pgindent.

Without reading the code, I have my doubts about that feature.

While it clearly will improve performance, it opens the door to
data loss.  People will use it to speed up their data loads and
then be unhappy if they cannot use their backups to recover from
a problem.

What happens if you try to do archive recovery across a time where
wal_level was "none"?  Will the recovery process fail, as it should,
or will you end up with data corruption?

We already have a performance-related footgun in the shape of
fsync = off.  Do we want to add another one?

Yours,
Laurenz Albe




RE: Disable WAL logging to speed up data loading

From
"tsunakawa.takay@fujitsu.com"
Date:
From: Osumi, Takamichi/大墨 昂道 <osumi.takamichi@fujitsu.com>
> I wrote and attached the first patch to disable WAL logging.
> This patch passes the regression test of check-world already and is formatted

I think make check-world uses the default setting for wal_level.  You need to set wal_level = none and do make
installcheck-world.


> I compared two wal_levels both 'minimal' and 'none'.
> For both levels, I measured
> (1) cluster's restore from pg_dumpall output,
> (2) COPY initial data to a defined table as initial data loading, and
> (3) COPY data to a defined table with tuples, 3 times each for all cases.
> After that, calculated the average and the ratio of the loading speed.
> The conclusion is that wal_level=none cuts about 20% of the loading speed
> compared to 'minimal' in the three cases above.

Hmm.  I wonder why pg_dumpall's performance increases by as much as 20%.  On the contrary to my previous hope,
pg_dumpalluses COPY to restore data, so it doesn't emit WAL when wal_level = minimal.  (Is it brought by the difference
ofwhether DDL's WAL is emitted or not?) 


> Sharing other scenario to measure is welcome.

How about running multiple concurrent data loading sessions when adding data to existing tables with data, so that WAL
isthe bottleneck?  That's the use case of the target customer, isn't it? 



> The input data was generated from pgbench with 1000 scale factor.
> It's about 9.3GB. For the table definition or the initial data for appended data

IIRC, I thought the scale factor of 1,000 is 1.5 GB.  What displayed 9.3 GB?  SELECT pg_database_size() or something?


Below are review comments:


(1)
@@ -449,6 +449,13 @@ XLogInsert(RmgrId rmid, uint8 info)
         return EndPos;
     }

+    /* Issues WAL only for transaction end and check point */
+    if (wal_level == WAL_LEVEL_NONE && rmid != RM_XLOG_ID)
+    {
+        XLogResetInsertion();
+        return GetLatestCheckPointLSN();
+    }
+
     do

This does not emit transaction completion WAL records.  Their RM ID is RM_XACT_ID.  Also, RM_XLOG_ID includes other
kindsof WAL records than the checkpoint WAL record.  Correct the comment accordingly.  I don't have a good idea on how
torepresent the RM_XLOG_ID,, but the following might help: 

[rmgrlist.h]
/* symbol name, textual name, redo, desc, identify, startup, cleanup */
PG_RMGR(RM_XLOG_ID, "XLOG", xlog_redo, xlog_desc, xlog_identify, NULL, NULL, NULL)
PG_RMGR(RM_XACT_ID, "Transaction", xact_redo, xact_desc, xact_identify, NULL, NULL, NULL)


(2)
diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
@@ -2591,10 +2591,10 @@ include_dir 'conf.d'
         data to support WAL archiving and replication, including running
         read-only queries on a standby server. <literal>minimal</literal> removes all
         logging except the information required to recover from a crash or
-        immediate shutdown.  Finally,
+        immediate shutdown.  <literal>none</literal> generates no WAL in any case. Finally,

According to the previous code, "none" emits some kinds of WAL records.  So I think we need to come up with a good name
and/ordescription. 



(3)
         <literal>logical</literal> adds information necessary to support logical
-        decoding.  Each level includes the information logged at all lower
-        levels.  This parameter can only be set at server start.
+        decoding.  Each level except for <literal>none</literal> includes the
+        information logged at all lower levels.  This parameter can only be set at server start.
        </para>

Why is this change necessary?


(4)
+        On the other hand, an unexpected crash of the server makes the database cluster
+        inconsistent. For that reason, before utilizing this level, get a full backup of the cluster and
+        backup of the entire operations that are done under the condition that
+        <varname>wal_level</varname> is <literal>none</literal>.

This gives the impression that the user can start the database server and see inconsistent data.  The reality is that
thedatabase server does not start, isn't it? 


(5)
@@ -1751,7 +1752,8 @@ SELECT * FROM x, y, a, b, c WHERE something AND somethingelse;
     Aside from avoiding the time for the archiver or WAL sender to process the
     WAL data, doing this will actually make certain commands faster, because
     they do not to write WAL at all if <varname>wal_level</varname>
-    is <literal>minimal</literal> and the current subtransaction (or top-level
+    is either <literal>minimal</literal> or <literal>minimal</literal>
+    and the current subtransaction (or top-level
     transaction) created or truncated the table or index they change.  (They
     can guarantee crash safety more cheaply by doing
     an <function>fsync</function> at the end than by writing WAL.)

This is not correct.  In minimal, some conditions need to hold true for WAL to not be generated as described above.
OTOH,wal_level = none does not generate WAL unconditionally. 


(6)
14.4.9. Some Notes about pg_dump
...
If using WAL archiving or streaming replication, consider disabling them during the restore. To do that, set
archive_modeto off, wal_level to minimal, and max_wal_senders to zero before loading the dump. Afterwards, set them
backto the right values and take a fresh base backup. 

Why don't you refer to wal_level = none here as well?


(7)
@@ -918,10 +918,13 @@ PostmasterMain(int argc, char *argv[])
                      ReservedBackends, MaxConnections);
         ExitPostmaster(1);
     }
-    if (XLogArchiveMode > ARCHIVE_MODE_OFF && wal_level == WAL_LEVEL_MINIMAL)
+    if ((XLogArchiveMode > ARCHIVE_MODE_OFF && wal_level == WAL_LEVEL_NONE) ||
+        (XLogArchiveMode > ARCHIVE_MODE_OFF && wal_level == WAL_LEVEL_MINIMAL))
         ereport(ERROR,
-                (errmsg("WAL archival cannot be enabled when wal_level is \"minimal\"")));
-    if (max_wal_senders > 0 && wal_level == WAL_LEVEL_MINIMAL)
+                (errmsg("WAL archival cannot be enabled when wal_level is \"%s\"",
+                        wal_level == WAL_LEVEL_MINIMAL ? "minimal" : "none")));
+    if ((max_wal_senders > 0 && wal_level == WAL_LEVEL_NONE) ||
+        (max_wal_senders > 0 && wal_level == WAL_LEVEL_MINIMAL))
         ereport(ERROR,
                 (errmsg("WAL streaming (max_wal_senders > 0) requires wal_level \"replica\" or \"logical\"")));

This style of writing conitions is redundant.  You can just change the code to wal_level <= WAL_LEVEL_MINIMAL.
Also, the first message can be '"minimal" or "none"' like the second one.


(8)
@@ -989,6 +992,12 @@ PostmasterMain(int argc, char *argv[])
     LocalProcessControlFile(false);

     /*
+     * Check some conditions specific to wal_level='none' and ensures the
+     * database isn't inconsistent.
+     */
+    SafelyFinishedNoneLevel();
+
+    /*

This check should be moved to around the beginning of StartupXLOG().  PostmasterMain() is called in multi-user mode.
It'snot called in single-user mode (postgres --single). 
The new function is not necessary.  StartupXLOG() can see the control file contents directly.



(9)
+    /*
+     * Detect if we previously crashed under wal_level='none' or not.
+     */
+    unexpected_shutdown = ControlFile->state != DB_SHUTDOWNED &&
+        ControlFile->state != DB_SHUTDOWNED_IN_RECOVERY;
+    if ((ControlFile->wal_level == WAL_LEVEL_NONE && unexpected_shutdown))
+    {
+        ereport(ERROR,
+                (errmsg("Server was unexpectedly shut down when WAL logging was disabled"),
+                 errhint("It looks like you need to deploy a new cluster from your full backup again.")));
+    }
+}

Refine the message according to the following message guideline.  For example, the primary message has to start with a
lowercase letter. 

https://www.postgresql.org/docs/devel/source.html



(10)
You need to edit this to add none:
src/backend/utils/misc/postgresql.conf.sample


(11)
src/include/access/xlogdefs.h
src/backend/access/transam/varsup.c

Consider modifying the comments in these files that refer to wal_level.  Maybe wal_level <= minimal is enough?


(12)
src/include/utils/rel.h

Modify the RelationNeedsWAL() so that it returns false when wal_level = none.  Adding wal_level != WAL_LEVEL_NONE is
wouldprobably be okay. 


(13)
@@ -161,7 +161,8 @@ extern int    XLogArchiveMode;
 /* WAL levels */
 typedef enum WalLevel
 {
-    WAL_LEVEL_MINIMAL = 0,
+    WAL_LEVEL_NONE = 0,
+    WAL_LEVEL_MINIMAL,
     WAL_LEVEL_REPLICA,
     WAL_LEVEL_LOGICAL
 } WalLevel;

I'm a bit concerned about if we can change the values of existing symbols, because wal_level setting is stored in
pg_controlfile.  Having a quick look at source code, there seems to be no problem.  However, I'm not sure about
pg_upgrade. Can you try, for example, 

1. Create the database cluster with an older version, say, PG 13.
2. Start and stop the database server.
3. Run pg_controldata and see that it prints replica for the wal_level value.
4. Upgrade the database cluster with pg_upgrade.
5. Run pg_controldata and see the wal_level value.

If you change the values of existing symbols like your patch, you need to bump PG_CONTROL_VERSION.
If you find we can't tchange the existing values, you can probably set WAL_LEVEL_NONE to -1.


Regards
Takayuki Tsunakawa




RE: Disable WAL logging to speed up data loading

From
"osumi.takamichi@fujitsu.com"
Date:
Hi, Laurenz


> > I wrote and attached the first patch to disable WAL logging.
> > This patch passes the regression test of check-world already and is
> > formatted by pgindent.
> 
> Without reading the code, I have my doubts about that feature.
> 
> While it clearly will improve performance, it opens the door to data loss.
Therefore, this feature must avoid that
that kind of inconsistent server starts up again.
This has been discussed in this thread already.

> People will use it to speed up their data loads and then be unhappy if they
> cannot use their backups to recover from a problem.
> 
> What happens if you try to do archive recovery across a time where wal_level
> was "none"?  Will the recovery process fail, as it should, or will you end up
> with data corruption?
> 
> We already have a performance-related footgun in the shape of fsync = off.
> Do we want to add another one?
Further, in this thread, we discuss that 
this feature is intended to serve under
some specific opportunities like DBA wants
to load data as soon as possible and/or the operation itself is easily *repeatable*.
So, before and after the change of wal_level, DBA needs to take a full backup to
prepare the unexpected crash.

But anyway, I'll fix and enrich the documents. Thanks.

Best,
    Takamichi Osumi

Re: Disable WAL logging to speed up data loading

From
Laurenz Albe
Date:
On Wed, 2020-10-28 at 09:55 +0000, osumi.takamichi@fujitsu.com wrote:
> > > I wrote and attached the first patch to disable WAL logging.
> > > This patch passes the regression test of check-world already and is
> > > formatted by pgindent.
> >
> > Without reading the code, I have my doubts about that feature.
> > While it clearly will improve performance, it opens the door to data loss.
> 
> Therefore, this feature must avoid that
> that kind of inconsistent server starts up again.
> This has been discussed in this thread already.
>
> > People will use it to speed up their data loads and then be unhappy if they
> > cannot use their backups to recover from a problem.
> > What happens if you try to do archive recovery across a time where wal_level
> > was "none"?  Will the recovery process fail, as it should, or will you end up
> > with data corruption?
> > We already have a performance-related footgun in the shape of fsync = off.
> > Do we want to add another one?
> 
> Further, in this thread, we discuss that 
> this feature is intended to serve under
> some specific opportunities like DBA wants
> to load data as soon as possible and/or the operation itself is easily *repeatable*.
> So, before and after the change of wal_level, DBA needs to take a full backup to
> prepare the unexpected crash.
> 
> But anyway, I'll fix and enrich the documents. Thanks.

I read through the thread and the patch now.

The only safety I see is that startup after a crash is prevented.

But what if someone sets wal_level=none, performs some data modifications,
sets wal_level=archive and after dome more processing decides to restore from
a backup that was taken before the cluster was set to wal_level=none?
Then they would end up with a corrupted database, right?

I think the least this patch needs is that starting with wal_level=none emits
a WAL record that will make recovery fail.

I am aware that this is intended for "specific opportunities", but we still
should make it as hard as possible for the user to cause harm.  It may be that
MySQL, which inspired this feature, does not care about that, but I think we
should do better.

Another point that makes me worry is that this feature will unconditionally
break all replication, and there is not the least mention of that in the
documentation.

Yours,
Laurenz Albe




Re: Disable WAL logging to speed up data loading

From
Laurenz Albe
Date:
On Wed, 2020-10-28 at 14:05 +0100, I wrote:
> But what if someone sets wal_level=none, performs some data modifications,
> sets wal_level=archive and after dome more processing decides to restore from
> a backup that was taken before the cluster was set to wal_level=none?
> Then they would end up with a corrupted database, right?
> 
> I think the least this patch needs is that starting with wal_level=none emits
> a WAL record that will make recovery fail.

I just realized that changing "wal_level" will cause a WAL record anyway.
Besides, the situation is not much different from changing to "wal_level = minimal".
So as long as PostgreSQL refuses to start after a crash, we should be good.

Sorry for the noise, and I am beginning to think that this is actually
a useful feature.

Yours,
Laurenz Albe




RE: Disable WAL logging to speed up data loading

From
"osumi.takamichi@fujitsu.com"
Date:
Hello


> > But what if someone sets wal_level=none, performs some data
> > modifications, sets wal_level=archive and after dome more processing
> > decides to restore from a backup that was taken before the cluster was set
> to wal_level=none?
> > Then they would end up with a corrupted database, right?
> >
> > I think the least this patch needs is that starting with
> > wal_level=none emits a WAL record that will make recovery fail.
> 
> I just realized that changing "wal_level" will cause a WAL record anyway.
> Besides, the situation is not much different from changing to "wal_level =
> minimal".
> So as long as PostgreSQL refuses to start after a crash, we should be good.
> 
> Sorry for the noise, and I am beginning to think that this is actually a useful
> feature.
No problem at all.
Probably, for some developers, was the name "none" confusing ?

Also, thank you for your pointing out my lack of explanation
in the documents of the replication in the previous e-mail.


Best,
    Takamichi Osumi

Re: Disable WAL logging to speed up data loading

From
Fujii Masao
Date:

On 2020/10/28 22:05, Laurenz Albe wrote:
> On Wed, 2020-10-28 at 09:55 +0000, osumi.takamichi@fujitsu.com wrote:
>>>> I wrote and attached the first patch to disable WAL logging.
>>>> This patch passes the regression test of check-world already and is
>>>> formatted by pgindent.
>>>
>>> Without reading the code, I have my doubts about that feature.
>>> While it clearly will improve performance, it opens the door to data loss.
>>
>> Therefore, this feature must avoid that
>> that kind of inconsistent server starts up again.
>> This has been discussed in this thread already.
>>
>>> People will use it to speed up their data loads and then be unhappy if they
>>> cannot use their backups to recover from a problem.
>>> What happens if you try to do archive recovery across a time where wal_level
>>> was "none"?  Will the recovery process fail, as it should, or will you end up
>>> with data corruption?
>>> We already have a performance-related footgun in the shape of fsync = off.
>>> Do we want to add another one?
>>
>> Further, in this thread, we discuss that
>> this feature is intended to serve under
>> some specific opportunities like DBA wants
>> to load data as soon as possible and/or the operation itself is easily *repeatable*.
>> So, before and after the change of wal_level, DBA needs to take a full backup to
>> prepare the unexpected crash.
>>
>> But anyway, I'll fix and enrich the documents. Thanks.
> 
> I read through the thread and the patch now.
> 
> The only safety I see is that startup after a crash is prevented.
> 
> But what if someone sets wal_level=none, performs some data modifications,
> sets wal_level=archive and after dome more processing decides to restore from
> a backup that was taken before the cluster was set to wal_level=none?
> Then they would end up with a corrupted database, right?

I think that the guard to prevent the server from starting up from
the corrupted database in that senario is necessary.

> 
> I think the least this patch needs is that starting with wal_level=none emits
> a WAL record that will make recovery fail.
> 
> I am aware that this is intended for "specific opportunities", but we still
> should make it as hard as possible for the user to cause harm.  It may be that
> MySQL, which inspired this feature, does not care about that, but I think we
> should do better.

I'm still not sure if it's worth supporting this feature in core.
Because it can really really easily cause users to corrupt whole the database.

BTW, with the patch, I observed that PREPARE TRANSACTION and
COMMIT PREPARED caused assertion failure in my env, as I pointed upthread.

How does the patch handle other feature depending on the existence of WAL,
e.g., pg_logical_emit_message()?

Regards,

-- 
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION



Re: Disable WAL logging to speed up data loading

From
Laurenz Albe
Date:
On Thu, 2020-10-29 at 11:42 +0900, Fujii Masao wrote:
> > But what if someone sets wal_level=none, performs some data modifications,
> > sets wal_level=archive and after dome more processing decides to restore from
> > a backup that was taken before the cluster was set to wal_level=none?
> > Then they would end up with a corrupted database, right?
> 
> I think that the guard to prevent the server from starting up from
> the corrupted database in that senario is necessary.

That wouldn't apply, I think, because the backup from which you start
was taken with wal_level = replica, so the guard wouldn't alert.

But as I said in the other thread, changing wal_level emits a WAL
record, and I am sure that recovery will refuse to proceed if
wal_level < replica.

> I'm still not sure if it's worth supporting this feature in core.
> Because it can really really easily cause users to corrupt whole the database.

You mean, if they take no backup before setting wal_level = none
and then crash the database, so that they are stuck with an
unrecoverable database?

Yes, that feels somewhat too fast and loose...
It would at least require some fat warnings in the documentation
and in postgresql.conf.

Yours,
Laurenz Albe




Re: Disable WAL logging to speed up data loading

From
Laurenz Albe
Date:
On Thu, 2020-10-29 at 00:07 +0000, osumi.takamichi@fujitsu.com wrote:
> > Sorry for the noise, and I am beginning to think that this is actually a useful
> > feature.
> 
> No problem at all.
> Probably, for some developers, was the name "none" confusing ?

No, I think that "none" is quite accurate.

The worry is that it makes it quite easy for people to end up with
a crashed database that cannot start - at the very least there should be
warnings in the documentation and postgresql.conf that are as dire as
for "fsync = off".

Yours,
Laurenz Albe




Re: Disable WAL logging to speed up data loading

From
Fujii Masao
Date:

On 2020/10/29 19:21, Laurenz Albe wrote:
> On Thu, 2020-10-29 at 11:42 +0900, Fujii Masao wrote:
>>> But what if someone sets wal_level=none, performs some data modifications,
>>> sets wal_level=archive and after dome more processing decides to restore from
>>> a backup that was taken before the cluster was set to wal_level=none?
>>> Then they would end up with a corrupted database, right?
>>
>> I think that the guard to prevent the server from starting up from
>> the corrupted database in that senario is necessary.
> 
> That wouldn't apply, I think, because the backup from which you start
> was taken with wal_level = replica, so the guard wouldn't alert.
> 
> But as I said in the other thread, changing wal_level emits a WAL
> record, and I am sure that recovery will refuse to proceed if
> wal_level < replica.

Yes. What I meant was such a safe guard needs to be implemented.

This may mean that if we want to recover the database from that backup,
we need to specify the recovery target so that the archive recovery stops
just before the WAL record indicating wal_level change.

> 
>> I'm still not sure if it's worth supporting this feature in core.
>> Because it can really really easily cause users to corrupt whole the database.
> 
> You mean, if they take no backup before setting wal_level = none
> and then crash the database, so that they are stuck with an
> unrecoverable database?

Yes. Also if the safe guard that we discussed the above is missing,
even when a backup is taken before wal_level=none, recovery from
the backup can make the database corrupted.

Regards,

-- 
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION



Re: Disable WAL logging to speed up data loading

From
Masahiko Sawada
Date:
On Fri, 30 Oct 2020 at 05:00, Fujii Masao <masao.fujii@oss.nttdata.com> wrote:
>
>
>
> On 2020/10/29 19:21, Laurenz Albe wrote:
> > On Thu, 2020-10-29 at 11:42 +0900, Fujii Masao wrote:
> >>> But what if someone sets wal_level=none, performs some data modifications,
> >>> sets wal_level=archive and after dome more processing decides to restore from
> >>> a backup that was taken before the cluster was set to wal_level=none?
> >>> Then they would end up with a corrupted database, right?
> >>
> >> I think that the guard to prevent the server from starting up from
> >> the corrupted database in that senario is necessary.
> >
> > That wouldn't apply, I think, because the backup from which you start
> > was taken with wal_level = replica, so the guard wouldn't alert.
> >
> > But as I said in the other thread, changing wal_level emits a WAL
> > record, and I am sure that recovery will refuse to proceed if
> > wal_level < replica.
>
> Yes. What I meant was such a safe guard needs to be implemented.
>
> This may mean that if we want to recover the database from that backup,
> we need to specify the recovery target so that the archive recovery stops
> just before the WAL record indicating wal_level change.

Yeah, it also means that setting wal_level to none makes the previous
backup no use even if the user has some generations of backup.

Does it make things simple if the usage of wal_level = 'none' is
limited to initial data loading for example? I mean we add a special
flag to initdb that sets wal_level to 'none' after initialization and
the user does initial data loading and set wal_level to >= minimal.
That is, we allow users to set from none to >= minimal but not for the
reverse. Since we prevent the database cluster from backup when
wal_level is none, the recovery never across wal_level = none. Not
sure this idea can address the case Osumi-san concerned though.

Regards,

-- 
Masahiko Sawada            http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Disable WAL logging to speed up data loading

From
Laurenz Albe
Date:
On Fri, 2020-10-30 at 05:00 +0900, Fujii Masao wrote:
> > But as I said in the other thread, changing wal_level emits a WAL
> > record, and I am sure that recovery will refuse to proceed if
> > wal_level < replica.
> 
> Yes. What I meant was such a safe guard needs to be implemented.

That should be easy; just modify this code:

static void
CheckRequiredParameterValues(void)
{
    /*
     * For archive recovery, the WAL must be generated with at least 'replica'
     * wal_level.
     */
    if (ArchiveRecoveryRequested &&
ControlFile->wal_level == WAL_LEVEL_MINIMAL)
    {
        ereport(WARNING,
                (errmsg("WAL was generated with wal_level=minimal, data may be missing"),
                 errhint("This happens
if you temporarily set wal_level=minimal without taking a new base backup.")));
    }

so that it tests

    if (ArchiveRecoveryRequested && ControlFile->wal_level <= WAL_LEVEL_MINIMAL)

and we should be safe.

Yours,
Laurenz Albe




RE: Disable WAL logging to speed up data loading

From
"osumi.takamichi@fujitsu.com"
Date:
Hello


On Friday, October 30, 2020 1:32 PM Masahiko Sawada wrote:
> > On 2020/10/29 19:21, Laurenz Albe wrote:
> > > On Thu, 2020-10-29 at 11:42 +0900, Fujii Masao wrote:
> > >>> But what if someone sets wal_level=none, performs some data
> > >>> modifications, sets wal_level=archive and after dome more
> > >>> processing decides to restore from a backup that was taken before
> the cluster was set to wal_level=none?
> > >>> Then they would end up with a corrupted database, right?
> > >>
> > >> I think that the guard to prevent the server from starting up from
> > >> the corrupted database in that senario is necessary.
> > >
> > > That wouldn't apply, I think, because the backup from which you
> > > start was taken with wal_level = replica, so the guard wouldn't alert.
> > >
> > > But as I said in the other thread, changing wal_level emits a WAL
> > > record, and I am sure that recovery will refuse to proceed if
> > > wal_level < replica.
> >
> > Yes. What I meant was such a safe guard needs to be implemented.
> >
> > This may mean that if we want to recover the database from that
> > backup, we need to specify the recovery target so that the archive
> > recovery stops just before the WAL record indicating wal_level change.
> 
> Yeah, it also means that setting wal_level to none makes the previous backup
> no use even if the user has some generations of backup.
> 
> Does it make things simple if the usage of wal_level = 'none' is limited to
> initial data loading for example? I mean we add a special flag to initdb that
> sets wal_level to 'none' after initialization and the user does initial data
> loading and set wal_level to >= minimal.
> That is, we allow users to set from none to >= minimal but not for the reverse.
> Since we prevent the database cluster from backup when wal_level is none,
> the recovery never across wal_level = none. Not sure this idea can address
> the case Osumi-san concerned though.
Thanks you so much for the discussion.

Hmm, this was a good idea to implement
some kind of valve to prevent backflow of wal_levels.
But, ideally I'd like to allow users to switch wal_levels from/to 'none',
in order to let them operate bulk load in a faster way by this feature
even if that operation isn't for the initial data loading.


Regards,
    Takamichi Osumi

Re: Disable WAL logging to speed up data loading

From
Robert Haas
Date:
On Thu, Oct 29, 2020 at 4:00 PM Fujii Masao <masao.fujii@oss.nttdata.com> wrote:
> Yes. What I meant was such a safe guard needs to be implemented.
>
> This may mean that if we want to recover the database from that backup,
> we need to specify the recovery target so that the archive recovery stops
> just before the WAL record indicating wal_level change.

Yeah, I think we need these kinds of safeguards, for sure.

I'm also concerned about the way that this proposed feature interacts
with incremental backup capabilities that already exist in tools like
pgBackRest, EDB's BART, pg_probackup, and future things we might want
to introduce into core, along the lines of what I have previously
proposed. Now, I think pgBackRest uses only timestamps and checksums,
so it probably doesn't care, but some of the other solutions rely on
WAL-scanning to gather a list of changed blocks. I guess there's no
reason that they can't notice the wal_level being changed and do the
right thing; they should probably have that kind of capability
already. Still, it strikes me that it might be useful if we had a
stronger mechanism.

I'm not exactly sure what that would look like, but suppose we had a
feature where every time wal_level drops below replica, a counter gets
incremented by 1, and that counter is saved in the control file. Or
maybe when wal_level drops below minimal to none. Or maybe there are
two counters. Anyway, the idea is that if you have a snapshot of the
cluster at one time and a snapshot at another time, you can see
whether anything scary has happened in the middle without needing all
of the WAL in between.

Maybe this is off-topic for this thread or not really needed, but I'm
not sure. I don't think wal_level=none is a bad idea intrinsically,
but I think it would be easy to implement it poorly and end up harming
a lot of users. I have no problem with giving people a way to do
dangerous things, but we should do our best to let people know how
much danger they've incurred.

By the way, another problem here is that some AMs - e.g. GiST, IIRC -
use LSNs to figure out whether a block has changed. For temporary and
unlogged tables, we use "fake" LSNs that are generated using a
counter, but that approach only works because such relations are never
really WAL-logged. Mixing fake LSNs and real LSNs will break stuff,
and not bumping the LSN when the page changes probably will, too.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Disable WAL logging to speed up data loading

From
Magnus Hagander
Date:
On Mon, Nov 2, 2020 at 4:28 PM Robert Haas <robertmhaas@gmail.com> wrote:
>
> On Thu, Oct 29, 2020 at 4:00 PM Fujii Masao <masao.fujii@oss.nttdata.com> wrote:
> > Yes. What I meant was such a safe guard needs to be implemented.
> >
> > This may mean that if we want to recover the database from that backup,
> > we need to specify the recovery target so that the archive recovery stops
> > just before the WAL record indicating wal_level change.
>
> Yeah, I think we need these kinds of safeguards, for sure.
>
> I'm also concerned about the way that this proposed feature interacts
> with incremental backup capabilities that already exist in tools like
> pgBackRest, EDB's BART, pg_probackup, and future things we might want
> to introduce into core, along the lines of what I have previously
> proposed. Now, I think pgBackRest uses only timestamps and checksums,
> so it probably doesn't care, but some of the other solutions rely on
> WAL-scanning to gather a list of changed blocks. I guess there's no
> reason that they can't notice the wal_level being changed and do the
> right thing; they should probably have that kind of capability
> already. Still, it strikes me that it might be useful if we had a
> stronger mechanism.
>
> I'm not exactly sure what that would look like, but suppose we had a
> feature where every time wal_level drops below replica, a counter gets
> incremented by 1, and that counter is saved in the control file. Or
> maybe when wal_level drops below minimal to none. Or maybe there are
> two counters. Anyway, the idea is that if you have a snapshot of the
> cluster at one time and a snapshot at another time, you can see
> whether anything scary has happened in the middle without needing all
> of the WAL in between.
>
> Maybe this is off-topic for this thread or not really needed, but I'm
> not sure. I don't think wal_level=none is a bad idea intrinsically,
> but I think it would be easy to implement it poorly and end up harming
> a lot of users. I have no problem with giving people a way to do
> dangerous things, but we should do our best to let people know how
> much danger they've incurred.

I definitely think this is something that should be thought out and
included in a patch like this, so it's definitely on-topic for this
thread.

Having the ability to turn things off can certainly be very useful.
Having the risk of having done so without realizing the damage caused
is a *big* foot-gun, and we need to do our best to protect against it.

This is not entirely unlike the idea that we've discussed before of
having basically a "tainted" flag in pg_control if the system has ever
been started up in say fsync=off, just to make sure that we have a
record of it. This wouldn't be the same flag of course, but it's a
similar problem, where even temporarily starting the cluster up with a
certain set of flags can do permanent damage which is not necessarily
fixed by changing it back and restarting.

This would also be something that should be exposed as monitoring
points (which it could be if it's in pg_control). That is, I can
imagine a *lot* of installations that would definitely want an alert
to fire if the cluster has ever been started up in a wal_level=none or
wal_level=minimal, at least up until the point where somebody has run
a new full backup.

-- 
 Magnus Hagander
 Me: https://www.hagander.net/
 Work: https://www.redpill-linpro.com/



Re: Disable WAL logging to speed up data loading

From
Stephen Frost
Date:
Greetings,

* Magnus Hagander (magnus@hagander.net) wrote:
> On Mon, Nov 2, 2020 at 4:28 PM Robert Haas <robertmhaas@gmail.com> wrote:
> >
> > On Thu, Oct 29, 2020 at 4:00 PM Fujii Masao <masao.fujii@oss.nttdata.com> wrote:
> > > Yes. What I meant was such a safe guard needs to be implemented.
> > >
> > > This may mean that if we want to recover the database from that backup,
> > > we need to specify the recovery target so that the archive recovery stops
> > > just before the WAL record indicating wal_level change.
> >
> > Yeah, I think we need these kinds of safeguards, for sure.
> >
> > I'm also concerned about the way that this proposed feature interacts
> > with incremental backup capabilities that already exist in tools like
> > pgBackRest, EDB's BART, pg_probackup, and future things we might want
> > to introduce into core, along the lines of what I have previously
> > proposed. Now, I think pgBackRest uses only timestamps and checksums,
> > so it probably doesn't care, but some of the other solutions rely on
> > WAL-scanning to gather a list of changed blocks. I guess there's no
> > reason that they can't notice the wal_level being changed and do the
> > right thing; they should probably have that kind of capability
> > already. Still, it strikes me that it might be useful if we had a
> > stronger mechanism.

Checking the WAL level certainly seems critical to anything that's
reading the WAL.  We certainly do this already when running as a
replica:

ereport(WARNING,
        (errmsg("WAL was generated with wal_level=minimal, data may be missing"),
        errhint("This happens if you temporarily set wal_level=minimal without taking a new base backup.")));

There's definitely a question about if a WARNING there is really
sufficient or not, considering that you could end up with 'logged'
tables on the replica that are missing data, but I'm not sure that
inventing a new, independent, mechanism for checking WAL level changes
makes sense.

While pgbackrest backups of a primary wouldn't be impacted, it does
support backing up from a replica (as do other backup systems, of
course) and if data that's supposed to be on the replica isn't there
because someone restarted PG with wal_level=minimal and then flipped it
back to replica and got the replica to move past that part of the WAL by
turning off hot standby, replaying, and then turning it back on, then
the backup is going to also be missing that data.

Perhaps that's where we need to have a stronger mechanism though- that
is, if we hit the above WARNING, ever, set a flag somewhere that tools
can check and which we could also check and throw further warnings
about.  In other words, every time this replica is started, we could
check for this flag and throw the same warning above about how data may
be missing, and we could have pg_basebackup flat out refuse to back up
from a replica where this flag has been set (maybe with some kind of
override mechanism...  maybe not).  After all, that's where the real
issue here is, isn't it?

> > I'm not exactly sure what that would look like, but suppose we had a
> > feature where every time wal_level drops below replica, a counter gets
> > incremented by 1, and that counter is saved in the control file. Or
> > maybe when wal_level drops below minimal to none. Or maybe there are
> > two counters. Anyway, the idea is that if you have a snapshot of the
> > cluster at one time and a snapshot at another time, you can see
> > whether anything scary has happened in the middle without needing all
> > of the WAL in between.
> >
> > Maybe this is off-topic for this thread or not really needed, but I'm
> > not sure. I don't think wal_level=none is a bad idea intrinsically,
> > but I think it would be easy to implement it poorly and end up harming
> > a lot of users. I have no problem with giving people a way to do
> > dangerous things, but we should do our best to let people know how
> > much danger they've incurred.

I'm not sure that wal_level=none is really the right way to address this
use-case.  We already have unlogged tables and that's pretty clean and
meets the "we want to load data fast without having to pay for WAL" use
case.  The argument here seems to be that to take advantage of unlogged
tables requires the tools using PG to know how to issue a 'CREATE
UNLOGGED TABLE' command instead of a 'CREATE TABLE' command.  That
doesn't seem like a huge leap, but we could make it easier by just
adding a 'table_log_default' or such GUC that could be set on the data
loading role to have all tables created by it be unlogged.

Looking back at the thread, it seems that perhaps the one other area
where this isn't ideal is when the user ultimately wants the data to be
*considered* logged even when it wasn't (ie: change from wal_level none,
or minimal, back to replica).  On a quick look, it seems that we're
missing a trick there for UNLOGGED -> LOGGED changes since, while I
don't think we'll WAL the table during that if wal_level is set to
minimal, I do think we'll still end up pointlessly making a copy of all
of the data when we don't really need to..?  Fixing that would be useful
in its own right though.

> I definitely think this is something that should be thought out and
> included in a patch like this, so it's definitely on-topic for this
> thread.

Considering it's an existing problem we have, I'm not entirely convinced
that it's on this patch to fix it beyond making sure to document the
impact.  That said, mucking with WAL level seems like a really big
hammer and it'd be nice if we could support this use case with more
granularity, as I try to outline above.

> Having the ability to turn things off can certainly be very useful.
> Having the risk of having done so without realizing the damage caused
> is a *big* foot-gun, and we need to do our best to protect against it.

I don't know that we really do enough in this regard, today, and maybe
that's something we should discuss changing, but that's not really a new
concern that this patch is introducing.

> This is not entirely unlike the idea that we've discussed before of
> having basically a "tainted" flag in pg_control if the system has ever
> been started up in say fsync=off, just to make sure that we have a
> record of it. This wouldn't be the same flag of course, but it's a
> similar problem, where even temporarily starting the cluster up with a
> certain set of flags can do permanent damage which is not necessarily
> fixed by changing it back and restarting.

While I think I understand the concern, I'm not sure that it entirely
applies to the fsync=off case unless the system isn't cleanly shut down
too, something that is usually caught and detected through other means.
Having the WAL level be lowered to minimal is something that can cause a
replica to end up getting corrupted even without an unclean restart or
similar happening.

> This would also be something that should be exposed as monitoring
> points (which it could be if it's in pg_control). That is, I can
> imagine a *lot* of installations that would definitely want an alert
> to fire if the cluster has ever been started up in a wal_level=none or
> wal_level=minimal, at least up until the point where somebody has run
> a new full backup.

I agree with the general idea that it'd be good for anything we do here
to be made available for monitoring tools to be able to see.

Thanks,

Stephen

Attachment

RE: Disable WAL logging to speed up data loading

From
"osumi.takamichi@fujitsu.com"
Date:
Hello, Stephen


On Tuesday, Nov 3, 2020 3:02 AM Stephen Frost <sfrost@snowman.net> wrote:
> * Magnus Hagander (magnus@hagander.net) wrote:
> > On Mon, Nov 2, 2020 at 4:28 PM Robert Haas <robertmhaas@gmail.com>
> wrote:
> > > On Thu, Oct 29, 2020 at 4:00 PM Fujii Masao
> <masao.fujii@oss.nttdata.com> wrote:
> > > > Yes. What I meant was such a safe guard needs to be implemented.
> > > >
> > > > This may mean that if we want to recover the database from that
> > > > backup, we need to specify the recovery target so that the archive
> > > > recovery stops just before the WAL record indicating wal_level change.
> > >
> > > Yeah, I think we need these kinds of safeguards, for sure.
> > >
> > > I'm also concerned about the way that this proposed feature
> > > interacts with incremental backup capabilities that already exist in
> > > tools like pgBackRest, EDB's BART, pg_probackup, and future things
> > > we might want to introduce into core, along the lines of what I have
> > > previously proposed. Now, I think pgBackRest uses only timestamps
> > > and checksums, so it probably doesn't care, but some of the other
> > > solutions rely on WAL-scanning to gather a list of changed blocks. I
> > > guess there's no reason that they can't notice the wal_level being
> > > changed and do the right thing; they should probably have that kind
> > > of capability already. Still, it strikes me that it might be useful
> > > if we had a stronger mechanism.
>
> Checking the WAL level certainly seems critical to anything that's reading the
> WAL.  We certainly do this already when running as a
> replica:
>
> ereport(WARNING,
>         (errmsg("WAL was generated with wal_level=minimal, data may be
> missing"),
>         errhint("This happens if you temporarily set wal_level=minimal
> without taking a new base backup.")));
>
> There's definitely a question about if a WARNING there is really sufficient or
> not, considering that you could end up with 'logged'
> tables on the replica that are missing data, but I'm not sure that inventing a
> new, independent, mechanism for checking WAL level changes makes sense.
>
> While pgbackrest backups of a primary wouldn't be impacted, it does support
> backing up from a replica (as do other backup systems, of
> course) and if data that's supposed to be on the replica isn't there because
> someone restarted PG with wal_level=minimal and then flipped it back to
> replica and got the replica to move past that part of the WAL by turning off hot
> standby, replaying, and then turning it back on, then the backup is going to
> also be missing that data.
>
> Perhaps that's where we need to have a stronger mechanism though- that is,
> if we hit the above WARNING, ever, set a flag somewhere that tools can check
> and which we could also check and throw further warnings about.  In other
> words, every time this replica is started, we could check for this flag and
> throw the same warning above about how data may be missing, and we could
> have pg_basebackup flat out refuse to back up from a replica where this flag
> has been set (maybe with some kind of override mechanism...  maybe not).
> After all, that's where the real issue here is, isn't it?
>
> > > I'm not exactly sure what that would look like, but suppose we had a
> > > feature where every time wal_level drops below replica, a counter
> > > gets incremented by 1, and that counter is saved in the control
> > > file. Or maybe when wal_level drops below minimal to none. Or maybe
> > > there are two counters. Anyway, the idea is that if you have a
> > > snapshot of the cluster at one time and a snapshot at another time,
> > > you can see whether anything scary has happened in the middle
> > > without needing all of the WAL in between.
> > >
> > > Maybe this is off-topic for this thread or not really needed, but
> > > I'm not sure. I don't think wal_level=none is a bad idea
> > > intrinsically, but I think it would be easy to implement it poorly
> > > and end up harming a lot of users. I have no problem with giving
> > > people a way to do dangerous things, but we should do our best to
> > > let people know how much danger they've incurred.
>
> I'm not sure that wal_level=none is really the right way to address this
> use-case.  We already have unlogged tables and that's pretty clean and
> meets the "we want to load data fast without having to pay for WAL" use case.
> The argument here seems to be that to take advantage of unlogged tables
> requires the tools using PG to know how to issue a 'CREATE UNLOGGED
> TABLE' command instead of a 'CREATE TABLE' command.  That doesn't
> seem like a huge leap, but we could make it easier by just adding a
> 'table_log_default' or such GUC that could be set on the data loading role to
> have all tables created by it be unlogged.
I'm afraid to say that in the case to setup all tables as unlogged,
the user are forced to be under tension to
back up *all* commands from application, in preparation for unexpected crash.
This is because whenever the server crashes,
the unlogged tables are truncated and the DBA needs to
input the processings after the last backup again without exception.
I didn't think that this was easy and satisfied the user.

In addition, as long as the tables are unlogged, the user cannot be released from
this condition or (requirement ?) to back up all commands or
to guarantee that all commands are repeatable for the DBA.

When I consider the use case is the system of data warehouse
as described upthread, the size of each table can be large.
Thus, changing the status from unlogged to logged (recoverable)
takes much time under the current circumstances, which was discussed before.

By having the limited window of time,
during wal_level=none, I'd like to make wal_level=none work to
localize and minimize the burden to guarantee all commands are
repeatable. To achieve this, after switching wal_level from none to higher ones,
the patch must ensure crash recovery, though.

Sorry that my current patch doesn't complete this aspect fully at present
but, may I have your opinion about this ?

Best,
    Takamichi Osumi



Re: Disable WAL logging to speed up data loading

From
Stephen Frost
Date:
Greetings,

* osumi.takamichi@fujitsu.com (osumi.takamichi@fujitsu.com) wrote:
> On Tuesday, Nov 3, 2020 3:02 AM Stephen Frost <sfrost@snowman.net> wrote:
> > I'm not sure that wal_level=none is really the right way to address this
> > use-case.  We already have unlogged tables and that's pretty clean and
> > meets the "we want to load data fast without having to pay for WAL" use case.
> > The argument here seems to be that to take advantage of unlogged tables
> > requires the tools using PG to know how to issue a 'CREATE UNLOGGED
> > TABLE' command instead of a 'CREATE TABLE' command.  That doesn't
> > seem like a huge leap, but we could make it easier by just adding a
> > 'table_log_default' or such GUC that could be set on the data loading role to
> > have all tables created by it be unlogged.
> I'm afraid to say that in the case to setup all tables as unlogged,
> the user are forced to be under tension to
> back up *all* commands from application, in preparation for unexpected crash.
> This is because whenever the server crashes,
> the unlogged tables are truncated and the DBA needs to
> input the processings after the last backup again without exception.
> I didn't think that this was easy and satisfied the user.

You'll need to explain how this is different from the proposed
'wal_level = none' option, since it sounded like that would be exactly
the same case..?

> In addition, as long as the tables are unlogged, the user cannot be released from
> this condition or (requirement ?) to back up all commands or
> to guarantee that all commands are repeatable for the DBA.

They can change the table to be logged though, if they wish to.

> When I consider the use case is the system of data warehouse
> as described upthread, the size of each table can be large.
> Thus, changing the status from unlogged to logged (recoverable)
> takes much time under the current circumstances, which was discussed before.

Ok- so the issue is that, today, we dump all of the table into the WAL
when we go from unlogged to logged, but as I outlined previously,
perhaps that's because we're missing a trick there when
wal_level=minimal.  If wal_level=minimal, then it would seem like we
could lock the table, then sync it and then mark is as logged, which is
more-or-less what you're asking to have be effectively done with the
proposed wal_level=none, but this would be an optimization for all
existing users of wal_level=minimal who have unlogged tables that they
want to change to logged, and this works on a per-table basis instead,
which seems like a better approach than a cluster-wide setting.

> By having the limited window of time,
> during wal_level=none, I'd like to make wal_level=none work to
> localize and minimize the burden to guarantee all commands are
> repeatable. To achieve this, after switching wal_level from none to higher ones,
> the patch must ensure crash recovery, though.

Perhaps a helper command could be added to ALTER TABLE ALL IN TABLESPACE
to marked a bunch of unlogged tables over to being logged would be good
to add too.

> Sorry that my current patch doesn't complete this aspect fully at present
> but, may I have your opinion about this ?

Presently, my feeling is that we could address this use-case without
having to introduce a new cluster-wide WAL level, and that's the
direction I'd want to see this going.  Perhaps I'm missing something
about why the approach I've set forth above wouldn't work, and
wal_level=none would, but I've not seen it yet.

Thanks,

Stephen

Attachment

Re: Disable WAL logging to speed up data loading

From
"David G. Johnston"
Date:
On Mon, Nov 9, 2020 at 8:18 AM Stephen Frost <sfrost@snowman.net> wrote:
Presently, my feeling is that we could address this use-case without
having to introduce a new cluster-wide WAL level, and that's the
direction I'd want to see this going.  Perhaps I'm missing something
about why the approach I've set forth above wouldn't work, and
wal_level=none would, but I've not seen it yet.


+1

We are trying to address a performance optimization for an insert-only scenario on a limited set of tables by placing the entire cluster in a dangerous state.  The "copy table unlogged" solution is definitely closer to what we want - this is 
demonstrably worse.

For this case the fundamental feature that would seem to be required is an ability for a transaction commit to return only after the system has ensured that all of the new pages added to the relation during the scope of the transaction have made it to disk.  Something like:

BEGIN UNLOGGED TRANSACTION FOR table1, table2;
-- locking probably allows reads, definitely disallows concurrent writes, to the named tables
-- Disallow updates and deletes, do not use dead tuple space, for the tables named.  Should be able to do normal stuff for other tables?
-- Always create new pages
COPY TO table1;
COPY TO table2;
COMMIT; -- wait here until data files for table1 and table2 are completely written and the transaction alive flag is committed to the WAL.

I suppose the above could be written "BEGIN UNLOGGED TRANSACTION FOR ALL TABLES" and you'd get the initial database population optimization capability.

If the commit doesn't complete all of the newly created pages are junk.  Otherwise, you have a crash-recoverable state for those tables as regards those specific pages.

Conceptually, we need an ability to perform a partial CHECKPOINT that names specific tables, and make sure the crash-recovery works for those tables while figuring out what amount of effort to expend on informing the dba and alerting/preventing features that require wal from using those tables.

David J.

Re: Disable WAL logging to speed up data loading

From
Stephen Frost
Date:
Greetings,

* David G. Johnston (david.g.johnston@gmail.com) wrote:
> On Mon, Nov 9, 2020 at 8:18 AM Stephen Frost <sfrost@snowman.net> wrote:
> > Presently, my feeling is that we could address this use-case without
> > having to introduce a new cluster-wide WAL level, and that's the
> > direction I'd want to see this going.  Perhaps I'm missing something
> > about why the approach I've set forth above wouldn't work, and
> > wal_level=none would, but I've not seen it yet.
>
> +1
>
> We are trying to address a performance optimization for an insert-only
> scenario on a limited set of tables by placing the entire cluster in a
> dangerous state.  The "copy table unlogged" solution is definitely closer
> to what we want - this is
> demonstrably worse.

Yeah, agreed.

> For this case the fundamental feature that would seem to be required is an
> ability for a transaction commit to return only after the system has
> ensured that all of the new pages added to the relation during the scope of
> the transaction have made it to disk.  Something like:
>
> BEGIN UNLOGGED TRANSACTION FOR table1, table2;
> -- locking probably allows reads, definitely disallows concurrent writes,
> to the named tables
> -- Disallow updates and deletes, do not use dead tuple space, for the
> tables named.  Should be able to do normal stuff for other tables?
> -- Always create new pages
> COPY TO table1;
> COPY TO table2;
> COMMIT; -- wait here until data files for table1 and table2 are completely
> written and the transaction alive flag is committed to the WAL.

That's certainly an interesting idea, but seems like a much larger step
than just making some improvements to how UNLOGGED tables work today,
and then perhaps some helper options to make it easier to create
UNLOGGED tables and change them from unlogged to logged when the
wal_level is set to 'minimal'.

Also- I don't think this would end up working for normally logged
relations at a wal_level higher than 'minimal', since if we don't
log those pages then they won't get to replicas.

> I suppose the above could be written "BEGIN UNLOGGED TRANSACTION FOR ALL
> TABLES" and you'd get the initial database population optimization
> capability.

Or just 'BEGIN UNLOGGED TRANSACTION'..  I wonder if we'd have to run
around and lock all tables as you're suggesting above or if we could
just lock them as they get used..

> If the commit doesn't complete all of the newly created pages are junk.
> Otherwise, you have a crash-recoverable state for those tables as regards
> those specific pages.

How would we track that and know which pages are junk?

> Conceptually, we need an ability to perform a partial CHECKPOINT that names
> specific tables, and make sure the crash-recovery works for those tables
> while figuring out what amount of effort to expend on informing the dba and
> alerting/preventing features that require wal from using those tables.

Yeah, seems pretty complicated.

Did you see an issue with the basic idea I proposed earlier, whereby an
unlogged table could become 'logged', while we are at wal_level=minimal,
by essentially checkpointing it (locking it, forcing out any buffers we
have associated with it, and then fsync'ing it- not sure how much of
that is already done in the unlogged->logged process but I would guess
most of it) while not actually writing it into the WAL?

Thanks,

Stephen

Attachment

Re: Disable WAL logging to speed up data loading

From
"David G. Johnston"
Date:
On Mon, Nov 9, 2020 at 10:36 AM Stephen Frost <sfrost@snowman.net> wrote:
* David G. Johnston (david.g.johnston@gmail.com) wrote:

> If the commit doesn't complete all of the newly created pages are junk.
> Otherwise, you have a crash-recoverable state for those tables as regards
> those specific pages.

How would we track that and know which pages are junk?

Every one of those pages would have a single dead transaction id contained within it.  If there is bookkeeping that needs to happen that could be wal logged - the goal here being not to avoid all wal but to avoid data wal.  I don't know enough about the internals here to be more specific.


> Conceptually, we need an ability to perform a partial CHECKPOINT that names
> specific tables, and make sure the crash-recovery works for those tables
> while figuring out what amount of effort to expend on informing the dba and
> alerting/preventing features that require wal from using those tables.

Yeah, seems pretty complicated.

Did you see an issue with the basic idea I proposed earlier, whereby an
unlogged table could become 'logged', while we are at wal_level=minimal,
by essentially checkpointing it (locking it, forcing out any buffers we
have associated with it, and then fsync'ing it- not sure how much of
that is already done in the unlogged->logged process but I would guess
most of it) while not actually writing it into the WAL?

That is basically half of what is described above - the part at commit when the relation is persisted to disk.  What your earlier description seems to be missing is the part about temporarily making a logged relation unlogged.  I envision that as being part of a transaction as opposed to a permanent attribute of the table.  I envision a storage parameter that allows individual relations to be considered as having wal_level='minimal' even if the system as a whole has, e.g., wal_level='replication'.  Only those could be forced into this temporarily unlogged mode.

One part I hadn't given thought to is indexes and how those would interact with this plan. Mostly due to lack of internals knowledge.

David J.

Re: Disable WAL logging to speed up data loading

From
Stephen Frost
Date:
Greetings,

* David G. Johnston (david.g.johnston@gmail.com) wrote:
> On Mon, Nov 9, 2020 at 10:36 AM Stephen Frost <sfrost@snowman.net> wrote:
> > * David G. Johnston (david.g.johnston@gmail.com) wrote:
> > > If the commit doesn't complete all of the newly created pages are junk.
> > > Otherwise, you have a crash-recoverable state for those tables as regards
> > > those specific pages.
> >
> > How would we track that and know which pages are junk?
>
> Every one of those pages would have a single dead transaction id contained
> within it.  If there is bookkeeping that needs to happen that could be wal
> logged - the goal here being not to avoid all wal but to avoid data wal.  I
> don't know enough about the internals here to be more specific.

Yeah, not sure how well that would end up working..  Maybe there's a way
to get there, but definitely a fair bit of complicated work.

> > > Conceptually, we need an ability to perform a partial CHECKPOINT that
> > names
> > > specific tables, and make sure the crash-recovery works for those tables
> > > while figuring out what amount of effort to expend on informing the dba
> > and
> > > alerting/preventing features that require wal from using those tables.
> >
> > Yeah, seems pretty complicated.
> >
> > Did you see an issue with the basic idea I proposed earlier, whereby an
> > unlogged table could become 'logged', while we are at wal_level=minimal,
> > by essentially checkpointing it (locking it, forcing out any buffers we
> > have associated with it, and then fsync'ing it- not sure how much of
> > that is already done in the unlogged->logged process but I would guess
> > most of it) while not actually writing it into the WAL?
>
> That is basically half of what is described above - the part at commit when
> the relation is persisted to disk.

Right, think I agree with you there.

> What your earlier description seems to
> be missing is the part about temporarily making a logged relation
> unlogged.

While I get that there may be use-cases for that, it seems that the
use-case being described here doesn't require it- there just needs to be
a way to take the unlogged table and make it into a 'logged' table
without having to actually write it all into the WAL when wal_level is
minimal.

There may be another option to addressing the use-case as you're looking
at it though- by using partitioning.  That is, there's a partitioned
table which has logged tables in it, a new unlogged table is created and
added to the partitioned table, it's then loaded, and then it's
converted to being logged (or maybe it's loaded first and then added to
the partitioned table, either way).  This would also have the advantage
that you'd be able to continue making changes to the partitioned table
as you normally do, in general.

> I envision that as being part of a transaction as opposed to a
> permanent attribute of the table.  I envision a storage parameter that
> allows individual relations to be considered as having wal_level='minimal'
> even if the system as a whole has, e.g., wal_level='replication'.  Only
> those could be forced into this temporarily unlogged mode.

I mean ... we have a way of saying that individual relations have a
lower WAL level than others- they're UNLOGGED.  I'm still seeing this as
an opportunity to build on that and improve that, rather than invent
something new.

Thanks,

Stephen

Attachment

Re: Disable WAL logging to speed up data loading

From
Kyotaro Horiguchi
Date:
At Mon, 9 Nov 2020 10:18:08 -0500, Stephen Frost <sfrost@snowman.net> wrote in 
> Greetings,
> 
> * osumi.takamichi@fujitsu.com (osumi.takamichi@fujitsu.com) wrote:
> > When I consider the use case is the system of data warehouse
> > as described upthread, the size of each table can be large.
> > Thus, changing the status from unlogged to logged (recoverable)
> > takes much time under the current circumstances, which was discussed before.
> 
> Ok- so the issue is that, today, we dump all of the table into the WAL
> when we go from unlogged to logged, but as I outlined previously,
> perhaps that's because we're missing a trick there when
> wal_level=minimal.  If wal_level=minimal, then it would seem like we
> could lock the table, then sync it and then mark is as logged, which is

FWIW, the following is that, I think it works not only when
wal_level=minimal for SET UNLOGGED, and only works when minimal for
SET LOGGED.

https://www.postgresql.org/message-id/20201002.100621.1668918756520136893.horikyota.ntt@gmail.com

| For fuel(?) of the discussion, I tried a very-quick PoC for in-place
| ALTER TABLE SET LOGGED/UNLOGGED and resulted as attached. After some
| trials of several ways, I drifted to the following way after poking
| several ways.
| 
| 1. Flip BM_PERMANENT of active buffers
| 2. adding/removing init fork
| 3. sync files,
| 4. Flip pg_class.relpersistence.
| 
| It always skips table copy in the SET UNLOGGED case, and only when
| wal_level=minimal in the SET LOGGED case.  Crash recovery seems
| working by some brief testing by hand.

> more-or-less what you're asking to have be effectively done with the
> proposed wal_level=none, but this would be an optimization for all
> existing users of wal_level=minimal who have unlogged tables that they
> want to change to logged, and this works on a per-table basis instead,
> which seems like a better approach than a cluster-wide setting.
> 
> > By having the limited window of time,
> > during wal_level=none, I'd like to make wal_level=none work to
> > localize and minimize the burden to guarantee all commands are
> > repeatable. To achieve this, after switching wal_level from none to higher ones,
> > the patch must ensure crash recovery, though.
> 
> Perhaps a helper command could be added to ALTER TABLE ALL IN TABLESPACE
> to marked a bunch of unlogged tables over to being logged would be good
> to add too.

I agree to this aspect of the in-place flipping of UNLOGGED.

> > Sorry that my current patch doesn't complete this aspect fully at present
> > but, may I have your opinion about this ?
> 
> Presently, my feeling is that we could address this use-case without
> having to introduce a new cluster-wide WAL level, and that's the
> direction I'd want to see this going.  Perhaps I'm missing something
> about why the approach I've set forth above wouldn't work, and
> wal_level=none would, but I've not seen it yet.

Couldn't we have something like the following?

 ALTER TABLE table1, table2, table3 SET UNLOGGED;

That is, multiple target object specification in ALTER TABLE sttatement.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



RE: Disable WAL logging to speed up data loading

From
"osumi.takamichi@fujitsu.com"
Date:
Horiguchi-San


On Tuesday, November 10, 2020 10:00 AM  Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote:
> FWIW, the following is that, I think it works not only when wal_level=minimal
> for SET UNLOGGED, and only works when minimal for SET LOGGED.
>
> https://www.postgresql.org/message-id/20201002.100621.166891875652013
> 6893.horikyota.ntt@gmail.com
>
> | For fuel(?) of the discussion, I tried a very-quick PoC for in-place
> | ALTER TABLE SET LOGGED/UNLOGGED and resulted as attached. After
> some
> | trials of several ways, I drifted to the following way after poking
> | several ways.
> |
> | 1. Flip BM_PERMANENT of active buffers 2. adding/removing init fork 3.
> | sync files, 4. Flip pg_class.relpersistence.
> |
> | It always skips table copy in the SET UNLOGGED case, and only when
> | wal_level=minimal in the SET LOGGED case.  Crash recovery seems
> | working by some brief testing by hand.
> I agree to this aspect of the in-place flipping of UNLOGGED.
>
> Couldn't we have something like the following?
>  ALTER TABLE table1, table2, table3 SET UNLOGGED;
>
> That is, multiple target object specification in ALTER TABLE sttatement.
I *really* appreciate the 1st patch.
Did you register this patch with the current or next commitfest ?
I cannot find it. We should avoid that this patch misses people's attention.
When I'm available, I'd like to join the review for the patch as well.

Also, I supposed that as long as
something impossible to implement for wal_level=none is not founded,
we could develop both patches in parallel, because both have good points
described in the previous e-mails of this thread.

Best,
    Takamichi Osumi



Re: Disable WAL logging to speed up data loading

From
Kyotaro Horiguchi
Date:
At Tue, 10 Nov 2020 09:33:12 -0500, Stephen Frost <sfrost@snowman.net> wrote in 
> Greetings,
> 
> * Kyotaro Horiguchi (horikyota.ntt@gmail.com) wrote:
> > For fuel(?) of the discussion, I tried a very-quick PoC for in-place
> > ALTER TABLE SET LOGGED/UNLOGGED and resulted as attached. After some
> > trials of several ways, I drifted to the following way after poking
> > several ways.
> > 
> > 1. Flip BM_PERMANENT of active buffers
> > 2. adding/removing init fork
> > 3. sync files,
> > 4. Flip pg_class.relpersistence.
> > 
> > It always skips table copy in the SET UNLOGGED case, and only when
> > wal_level=minimal in the SET LOGGED case.  Crash recovery seems
> > working by some brief testing by hand.
> 
> Somehow missed that this patch more-or-less does what I was referring to
> down-thread, but I did want to mention that it looks like it's missing a
> necessary FlushRelationBuffers() call before the sync, otherwise there
> could be dirty buffers for the relation that's being set to LOGGED (with
> wal_level=minimal), which wouldn't be good.  See the comments above
> smgrimmedsync().
>
> > Of course, I haven't performed intensive test on it.
> 
> Reading through the thread, it didn't seem very clear, but we should
> definitely make sure that it does the right thing on replicas when going
> between unlogged and logged (and between logged and unlogged too), of
> course.
> 
> Thanks,

Thanks!  Since this feature is different from the feature that is
being proposed in this thhead, I started another thread for this
feature.

https://www.postgresql.org/message-id/20201111.173317.460890039962481381.horikyota.ntt@gmail.com

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



RE: Disable WAL logging to speed up data loading

From
"tsunakawa.takay@fujitsu.com"
Date:
From: Kyotaro Horiguchi <horikyota.ntt@gmail.com>
> Thanks!  Since this feature is different from the feature that is
> being proposed in this thhead, I started another thread for this
> feature.
>
> https://www.postgresql.org/message-id/20201111.173317.460890039962481
> 381.horikyota.ntt@gmail.com

Thank you, Horiguchi-san(^^)  I was just about to ask you to separate the thread, because I think both features are
good:

* ALTER TABLE SET UNLOGGED/LOGGED without data copy
Good:
- Does not require server restart (if this feature can be used in all wal_level settings).

Bad:
- The user has to maintain and modify some scripts to use ALTER TABLE when adding or removing the tables/partitions to
loaddata into.  For example, if the data loading job specifies a partitioned table, he may forget to add ALTER TABLE
fornew partitions, resulting in slow data loading. 


* wal_level = none
Good:
- Easy to use.  The user does not have to be aware of what tables are loaded.  This can come in handy when migrating
froman older version or another DBMS, building test databases, and consolidating databases. 

Bad:
- Requires server restart.


* Both features
Bad:
- Requires taking database backup.
- Requires rebuilding the standby.


Sadly, the new thread's title includes a spelling mistake of extra r -- "In-placre".  (I hope this won't prevent the
searchhit in the mail archive.) 

I expect both features will be able to meet our customer's needs.  The worst scenario (I don't want to imagine!) is
thatneither feature fails to be committed.  So, let us continue both features.  I'll join Horiguchi-san's new thread,
andplease help us here too.  (I'll catch up with the recent discussion in this thread and reply.) 


> Couldn't we have something like the following?
>
>  ALTER TABLE table1, table2, table3 SET UNLOGGED;
>
> That is, multiple target object specification in ALTER TABLE sttatement.

Likewise, can't we do ALTER TABLE SET UNLOGGED/LOGGED against a partitioned table?  Currently, the statement succeeds
butnone of the partitioned table nor its partitions is set unlogged (pg_class.relpersistence remains 'p').  Is this
intended? If it's a bug, I'm willing to fix it so that it reports an eror.  Of course, it's good to make all partitions
unloggedat once. 


Regards
Takayuki Tsunakawa




Re: Disable WAL logging to speed up data loading

From
Stephen Frost
Date:
Greetings,

* tsunakawa.takay@fujitsu.com (tsunakawa.takay@fujitsu.com) wrote:
> * ALTER TABLE SET UNLOGGED/LOGGED without data copy
> Good:
> - Does not require server restart (if this feature can be used in all wal_level settings).
>
> Bad:
> - The user has to maintain and modify some scripts to use ALTER TABLE when adding or removing the tables/partitions
toload data into.  For example, if the data loading job specifies a partitioned table, he may forget to add ALTER TABLE
fornew partitions, resulting in slow data loading. 

I'm not sure that I see this as really being much of an issue.  Perhaps
there are some things we can do, as I mentioned before, to make it
easier for users to have tables be created as unlogged from the start,
or to be able to ALTER TABLE a bunch of tables at once (using all in
tablespace, or maybe having an ALTER TABLE on a partitioned table
cascade to the partitions), but overall the risk here seems very low-
clearly whatever processing is running to load the data into a
particular table knows what the table is and adding an ALTER TABLE into
it would be trivial.

Specifically, for a partitioned table, I would think the load would go
something like:

CREATE UNLOGGED TABLE ...
load all of the data
ALTER TABLE ... SET LOGGED
ALTER TABLE ... ATTACH PARTITION

If the data load could all be done in a single transaction then you
wouldn't even need to create the table as UNLOGGED or issue the SET
LOGGED, with wal_level=minimal, you just need to create the table in the
same transaction that you do the data load in.

> * wal_level = none
> Good:
> - Easy to use.  The user does not have to be aware of what tables are loaded.  This can come in handy when migrating
froman older version or another DBMS, building test databases, and consolidating databases. 

A GUC that allowed users to set a default for newly created tables to be
unlogged would also address this.

> Bad:
> - Requires server restart.

Introducing yet another wal_level strikes me as a very large step, one
that the arguments presented here for why it'd be worth it don't come
anywhere near justifying that step.

> I expect both features will be able to meet our customer's needs.  The worst scenario (I don't want to imagine!) is
thatneither feature fails to be committed.  So, let us continue both features.  I'll join Horiguchi-san's new thread,
andplease help us here too.  (I'll catch up with the recent discussion in this thread and reply.) 

While you're certainly welcome to spend your time where you wish to, if,
as you say, making these changes to how tables can be switched from
unlogged to logged with wal_level=minimal meets this use-case then that
strikes me as definitely the right approach and removes any
justification for adding another wal_level.

> > Couldn't we have something like the following?
> >
> >  ALTER TABLE table1, table2, table3 SET UNLOGGED;
> >
> > That is, multiple target object specification in ALTER TABLE sttatement.
>
> Likewise, can't we do ALTER TABLE SET UNLOGGED/LOGGED against a partitioned table?  Currently, the statement succeeds
butnone of the partitioned table nor its partitions is set unlogged (pg_class.relpersistence remains 'p').  Is this
intended? If it's a bug, I'm willing to fix it so that it reports an eror.  Of course, it's good to make all partitions
unloggedat once. 

I agree that this doesn't seem quite right and considering the way other
commands work like CREATE INDEX, I would think that doing such an ALTER
TABLE would recurse to the individual partitions (skipping over any
which are already set to the persistance desired..).

Thanks,

Stephen

Attachment

RE: Disable WAL logging to speed up data loading

From
"tsunakawa.takay@fujitsu.com"
Date:
From: Stephen Frost <sfrost@snowman.net>
> I'm not sure that I see this as really being much of an issue.  Perhaps there are
> some things we can do, as I mentioned before, to make it easier for users to
> have tables be created as unlogged from the start, or to be able to ALTER
> TABLE a bunch of tables at once (using all in tablespace, or maybe having an
> ALTER TABLE on a partitioned table cascade to the partitions), but overall the
> risk here seems very low- clearly whatever processing is running to load the
> data into a particular table knows what the table is and adding an ALTER
> TABLE into it would be trivial.

Thank you for your opinion.  I'm glad to see great people like Robert, Magnus and you have given comments (honestly, I
didn'texpect this trivial feature wouldn't get attention.)  I'll read those mails from the oldest including this, and
replyto them if needed. 


> > Likewise, can't we do ALTER TABLE SET UNLOGGED/LOGGED against a
> partitioned table?  Currently, the statement succeeds but none of the
> partitioned table nor its partitions is set unlogged (pg_class.relpersistence
> remains 'p').  Is this intended?  If it's a bug, I'm willing to fix it so that it reports
> an eror.  Of course, it's good to make all partitions unlogged at once.
>
> I agree that this doesn't seem quite right and considering the way other
> commands work like CREATE INDEX, I would think that doing such an ALTER
> TABLE would recurse to the individual partitions (skipping over any which are
> already set to the persistance desired..).

OK, I'll try to propagate the alteration to partitions.  (I wish the fix will be easy, but I'm afraid some devil is
lurking...) I'd appreciate it if someone could kindly point out the devil if he/she knows the past history. 


Regards
Takayuki Tsunakawa




RE: Disable WAL logging to speed up data loading

From
"osumi.takamichi@fujitsu.com"
Date:
Hi


On Thursday, October 29, 2020 11:42 AM Fujii Masao <masao.fujii@oss.nttdata.com> wrote:
> BTW, with the patch, I observed that PREPARE TRANSACTION and COMMIT
> PREPARED caused assertion failure in my env, as I pointed upthread.
> 
> How does the patch handle other feature depending on the existence of WAL,
> e.g., pg_logical_emit_message()?

I've updated the first patch, based on comments
from both Tsunakwa-San and Fujii-San mainly.
I'll take other comments in the next patch.
(Note that this patch passes make check-world but
doesn't pass installcheck-world yet.)

The assertion failure Fujii-San reported in the past has been protected by
adding a check to detect whether PREPARE TRANSACTION is issued
when wal_level=none or not in the v02.
Meanwhile, I didn't change the code for COMMIT PREPARED
because prohibiting usage of PREPARE TRANSACTION
means user cannot use COMMIT PREPARED as well.
I learnt this way of modification from the case that when max_prepared_transaction
is set to zero, PREPARE TRANSACTION cannot be used because of wa_level check,
while issuing COMMIT TRANSACTION itself doesn't claim wal_level.
Just prohibiting PREPARE TRANSACTION seemed OK for me.

As for pg_resetwal (and other commands like pg_rewind),
I didn't make any change. pg_resetwal is used to make corrupted server
start up again as a *last* resort by guessing the content of the control file,
while wal_level=none is designed never to make
the server start again when any unexpected crash is detected.
Here, the documentation in the patch about wal_level=none 
requests user to recreate the whole cluster again
when such a corruption of the cluster happens. Thus, it's not natural to think
that user of wal_level=none will continue to use the coruppted cluster forcibly
by applying pg_resetwal. This is the reason I made no fix of pg_resetwal.
In terms of other commands, I don't think there was a need to fix.

By the way, I'm not sure why functions related to replication origin
(e.g. pg_replication_origin_create) doesn't have a check of
wal_level. It can be used when wal_level < logical even though
their purposes are to safely keep track of replication progress.
Did it mean something special to execute such function when wal_level < logical ?
In the patch, I added an error check on this point.

Another similar case is that
there's no test to check wal_level for pg_logical_emit_message.
In my humble opinion, pg_logical_emit_message should not be used
when wal_level <= minimal. I didn't think that
without setting up a slot for logical replication
(with the restriction that user cannot create a slot by 
pg_create_physical_replication_slot when wal_level < replica),
plugins will utilize message from pg_logical_emit_message.
Or, if there is a slot that has been created before
restarting the server to turn on wal_level=none,
I cannot get the worth to execute the function
because other kinds of WAL are not issued.
Thus, I prohibited the usage of pg_logical_emit_message this time.
Did it make sense ?


Best,
    Takamichi Osumi

Attachment

RE: Disable WAL logging to speed up data loading

From
"tsunakawa.takay@fujitsu.com"
Date:
From: Robert Haas <robertmhaas@gmail.com>
> I'm also concerned about the way that this proposed feature interacts with
> incremental backup capabilities that already exist in tools like pgBackRest,
> EDB's BART, pg_probackup, and future things we might want to introduce into
> core, along the lines of what I have previously proposed. Now, I think
> pgBackRest uses only timestamps and checksums, so it probably doesn't care,
> but some of the other solutions rely on WAL-scanning to gather a list of
> changed blocks. I guess there's no reason that they can't notice the wal_level
> being changed and do the right thing; they should probably have that kind of
> capability already. Still, it strikes me that it might be useful if we had a stronger
> mechanism.

Having a quick look, those backup tools seem to require setting wal_level to replica or higher.  That's no wonder,
becauserecovering the database needs WAL for non-relation resources such as pg_control and relation map.  So, I think
wal_level= none won't introduce new issues (compared to wal_level = minimal, which also can lack WAL records for some
dataupdates.)
 


> By the way, another problem here is that some AMs - e.g. GiST, IIRC - use LSNs
> to figure out whether a block has changed. For temporary and unlogged tables,
> we use "fake" LSNs that are generated using a counter, but that approach only
> works because such relations are never really WAL-logged. Mixing fake LSNs
> and real LSNs will break stuff, and not bumping the LSN when the page
> changes probably will, too.

Unlogged GiST indexes use fake LSNs that are instance-wide.  Unlogged temporary GiST indexes use backend-local sequence
values. Other unlogged and temporary relations don't set LSNs on pages.  So, I think it's enough to call
GetFakeLSNForUnloggedRel()when wal_level = none as well.
 


Regards
Takayuki Tsunakawa


RE: Disable WAL logging to speed up data loading

From
"tsunakawa.takay@fujitsu.com"
Date:

# It'd be helpful if you could send mails in text format, not HTML.

 

From: David G. Johnston <david.g.johnston@gmail.com>

> For this case the fundamental feature that would seem to be required is an ability for a transaction commit to return only after the system has ensured that all of the new pages added to the relation during the scope of the transaction have made it to disk.  Something like:

>

> BEGIN UNLOGGED TRANSACTION FOR table1, table2;

> -- locking probably allows reads, definitely disallows concurrent writes, to the named tables

> -- Disallow updates and deletes, do not use dead tuple space, for the tables named.  Should be able to do normal stuff for other tables?

> -- Always create new pages

> COPY TO table1;

> COPY TO table2;

> COMMIT; -- wait here until data files for table1 and table2 are completely written and the transaction alive flag is committed to the WAL.

>

> I suppose the above could be written "BEGIN UNLOGGED TRANSACTION FOR ALL TABLES" and you'd get the initial database population optimization capability.

>

> If the commit doesn't complete all of the newly created pages are junk.  Otherwise, you have a crash-recoverable state for those tables as regards those specific pages.

 

As Steven-san said, I don't want to go this complicated direction.  Plus, putting my feet in the user's shoes, I want to try to avoid introducing a new SQL syntax for this kind of performance boost, which requires applications and maintenance scripts and testing.

 

 

Regards

Takayuki Tsunakawa

 

RE: Disable WAL logging to speed up data loading

From
"osumi.takamichi@fujitsu.com"
Date:
Hello


In the past discussion of wal_level=none,
there were some ideas to trace the change of wal_level,
in other words, *stronger mechanism* to check wal_level.
I agree with the idea to have a new monitoring item
and would like to implement those kind of, or one of those ideas for the next patch.
But I'm not sure where is the right place to write the information.
Was the best place the control file ?
On Tuesday, Nov 3, 2020 3:02 AM Stephen Frost <sfrost@snowman.net> wrote:
> Checking the WAL level certainly seems critical to anything that's reading the
> WAL.  We certainly do this already when running as a
> replica:
>
> ereport(WARNING,
>         (errmsg("WAL was generated with wal_level=minimal, data may be
> missing"),
>         errhint("This happens if you temporarily set wal_level=minimal
> without taking a new base backup.")));
>
> There's definitely a question about if a WARNING there is really sufficient or
> not, considering that you could end up with 'logged'
> tables on the replica that are missing data, but I'm not sure that inventing a
> new, independent, mechanism for checking WAL level changes makes sense.
>
> While pgbackrest backups of a primary wouldn't be impacted, it does support
> backing up from a replica (as do other backup systems, of
> course) and if data that's supposed to be on the replica isn't there because
> someone restarted PG with wal_level=minimal and then flipped it back to
> replica and got the replica to move past that part of the WAL by turning off hot
> standby, replaying, and then turning it back on, then the backup is going to
> also be missing that data.
>
> Perhaps that's where we need to have a stronger mechanism though- that is,
> if we hit the above WARNING, ever, set a flag somewhere that tools can check
> and which we could also check and throw further warnings about.  In other
> words, every time this replica is started, we could check for this flag and
> throw the same warning above about how data may be missing, and we could
> have pg_basebackup flat out refuse to back up from a replica where this flag
> has been set (maybe with some kind of override mechanism...  maybe not).
> After all, that's where the real issue here is, isn't it?
The first idea is warning that means replica could miss some data.
This is to notify some tools that taking a backup from replica is dangerous.
This can be applied to a change from replica to minimal(or none) and useful
because having the wal_level lowered to minimal does not mean
unclean shutdown or unexpected stoppage but it means a replica might get corrupted.

On Tuesday, Nov 3, 2020 12:28 AM Robert Haas <robertmhaas@gmail.com> wrote:
> I'm not exactly sure what that would look like, but suppose we had a feature
> where every time wal_level drops below replica, a counter gets incremented by
> 1, and that counter is saved in the control file. Or maybe when wal_level drops
> below minimal to none. Or maybe there are two counters. Anyway, the idea is
> that if you have a snapshot of the cluster at one time and a snapshot at
> another time, you can see whether anything scary has happened in the middle
> without needing all of the WAL in between.
The second idea is incremental counter that indicates drop of wal_level
from replica to minimal (or like from minimal to none).
Its purpose is to compare the wal_level changes between snapshots.
When any monitoring tools detect any difference of the counter,
we can predict something happened immediately without checking WAL in between.

> > This would also be something that should be exposed as monitoring
> > points (which it could be if it's in pg_control). That is, I can
> > imagine a *lot* of installations that would definitely want an alert
> > to fire if the cluster has ever been started up in a wal_level=none or
> > wal_level=minimal, at least up until the point where somebody has run
> > a new full backup.
>
> I agree with the general idea that it'd be good for anything we do here to be
> made available for monitoring tools to be able to see.
The last one is notification of permanent damage from initial cluster parameter,
although this may be similar to the second one.
This is an indication that shows some part of data are not recovered,
if the cluster was initialized and loaded some data during wal_level=none.

Exposing the monitoring item for tools sounds reasonable for me of course.
Was the control file the best place for the information to implement those kind of ideas ?
What I'm concerned about is that those three items are not necessarily meaningful for all users.
Also, it seems better the size of the control file should be saved
because the size is limited to less than 512 bytes for atomic writes, which is written in a comment.
Here, I didn't say that I'll take all of those ideas into the patch
but this perspective doesn't change whenever we change the content of control file, I would say.
Another aspect unique to the last idea is that
the value will be no use after taking a new full backup.
Thus, I'm not sure if writing that kind of information in the control file is right or not.

Before my implementation,
I would like to get an agreement from the community on this point.


Best,
    Takamichi Osumi



RE: Disable WAL logging to speed up data loading

From
"tsunakawa.takay@fujitsu.com"
Date:
From: Osumi, Takamichi/大墨 昂道 <osumi.takamichi@fujitsu.com>
> there were some ideas to trace the change of wal_level,
> in other words, *stronger mechanism* to check wal_level.
> I agree with the idea to have a new monitoring item
> and would like to implement those kind of, or one of those ideas for the next
> patch.
> But I'm not sure where is the right place to write the information.
> Was the best place the control file ?

Yes, I think so (I find nowhere else.)


> On Tuesday, Nov 3, 2020 3:02 AM Stephen Frost <sfrost@snowman.net>
> wrote:
> > Checking the WAL level certainly seems critical to anything that's reading the
> > WAL.  We certainly do this already when running as a
> > replica:
> >
> > ereport(WARNING,
> >         (errmsg("WAL was generated with wal_level=minimal, data may be
> > missing"),
> >         errhint("This happens if you temporarily set wal_level=minimal
> > without taking a new base backup.")));
> >
> > There's definitely a question about if a WARNING there is really sufficient or
> > not, considering that you could end up with 'logged'
> > tables on the replica that are missing data, but I'm not sure that inventing a
> > new, independent, mechanism for checking WAL level changes makes
> sense.
> >
> > While pgbackrest backups of a primary wouldn't be impacted, it does
> support
> > backing up from a replica (as do other backup systems, of
> > course) and if data that's supposed to be on the replica isn't there because
> > someone restarted PG with wal_level=minimal and then flipped it back to
> > replica and got the replica to move past that part of the WAL by turning off hot
> > standby, replaying, and then turning it back on, then the backup is going to
> > also be missing that data.
> >
> > Perhaps that's where we need to have a stronger mechanism though- that is,
> > if we hit the above WARNING, ever, set a flag somewhere that tools can check
> > and which we could also check and throw further warnings about.  In other
> > words, every time this replica is started, we could check for this flag and
> > throw the same warning above about how data may be missing, and we could
> > have pg_basebackup flat out refuse to back up from a replica where this flag
> > has been set (maybe with some kind of override mechanism...  maybe not).
> > After all, that's where the real issue here is, isn't it?
> The first idea is warning that means replica could miss some data.
> This is to notify some tools that taking a backup from replica is dangerous.
> This can be applied to a change from replica to minimal(or none) and useful
> because having the wal_level lowered to minimal does not mean
> unclean shutdown or unexpected stoppage but it means a replica might get
> corrupted.

I don't know why WARNING was chosen.  I think it should be FATAL, resulting in the standby shutdown, disabling
restartingit, and urging the user to rebuild the standby.  (I guess that's overreaction because the user may not
performoperations that lack WAL while wal_level is minimal.) 


> On Tuesday, Nov 3, 2020 12:28 AM Robert Haas <robertmhaas@gmail.com>
> wrote:
> > I'm not exactly sure what that would look like, but suppose we had a feature
> > where every time wal_level drops below replica, a counter gets incremented
> by
> > 1, and that counter is saved in the control file. Or maybe when wal_level
> drops
> > below minimal to none. Or maybe there are two counters. Anyway, the idea is
> > that if you have a snapshot of the cluster at one time and a snapshot at
> > another time, you can see whether anything scary has happened in the
> middle
> > without needing all of the WAL in between.
> The second idea is incremental counter that indicates drop of wal_level
> from replica to minimal (or like from minimal to none).
> Its purpose is to compare the wal_level changes between snapshots.
> When any monitoring tools detect any difference of the counter,
> we can predict something happened immediately without checking WAL in
> between.
>
> > > This would also be something that should be exposed as monitoring
> > > points (which it could be if it's in pg_control). That is, I can
> > > imagine a *lot* of installations that would definitely want an alert
> > > to fire if the cluster has ever been started up in a wal_level=none or
> > > wal_level=minimal, at least up until the point where somebody has run
> > > a new full backup.
> >
> > I agree with the general idea that it'd be good for anything we do here to be
> > made available for monitoring tools to be able to see.
> The last one is notification of permanent damage from initial cluster parameter,
> although this may be similar to the second one.
> This is an indication that shows some part of data are not recovered,
> if the cluster was initialized and loaded some data during wal_level=none.
>
> Exposing the monitoring item for tools sounds reasonable for me of course.
> Was the control file the best place for the information to implement those kind
> of ideas ?
> What I'm concerned about is that those three items are not necessarily
> meaningful for all users.
> Also, it seems better the size of the control file should be saved
> because the size is limited to less than 512 bytes for atomic writes, which is
> written in a comment.
> Here, I didn't say that I'll take all of those ideas into the patch
> but this perspective doesn't change whenever we change the content of control
> file, I would say.
> Another aspect unique to the last idea is that
> the value will be no use after taking a new full backup.
> Thus, I'm not sure if writing that kind of information in the control file is right or
> not.

Let's depict the situation.  I may be misunderstanding, so any correction would be much welcome.  Here, I call the new
fieldwal_level_change_counter, which should be changed to a proper name. 

1. wal_level = replica.  Take a base backup and store it in $BACKUPDIR/20201119/.
  wal_level_change_counter = 0

2. Set wal_level = minimal or none, and restart the instance.  Perform some operations.
  wal_level_change_counter = 1

3. Set wal_level = replica, and restart the instance.
  wal_level_change_counter = 1

4. Some monitoring system compares the values of   wal_level_change_counter in $BACKUPDIR/20201119/ and $PGDATA/, and
noticesthe difference (0 and 1 respectively.) 
It warns the user that he/she should take a full backup because some WAL may be missing to recover the latest data from
thelast backup in $BACKUPDIR/20201119/. 


But I think this is a separate patch, because the issue already applies to wal_level = minimal.



Regards
Takayuki Tsunakawa




RE: Disable WAL logging to speed up data loading

From
"osumi.takamichi@fujitsu.com"
Date:
Hello


On Thursday, Nov 19, 2020 12:45 PM Tsunakawa, Takayuki/綱川 貴之 <tsunakawa.takay@fujitsu.com>
> > On Tuesday, Nov 3, 2020 3:02 AM Stephen Frost <sfrost@snowman.net>
> > wrote:
> > > Checking the WAL level certainly seems critical to anything that's
> > > reading the WAL.  We certainly do this already when running as a
> > > replica:
> > >
> > > ereport(WARNING,
> > >         (errmsg("WAL was generated with wal_level=minimal, data may
> > > be missing"),
> > >         errhint("This happens if you temporarily set
> > > wal_level=minimal without taking a new base backup.")));
> > >
> > > There's definitely a question about if a WARNING there is really
> > > sufficient or not, considering that you could end up with 'logged'
> > > tables on the replica that are missing data, but I'm not sure that
> > > inventing a new, independent, mechanism for checking WAL level
> > > changes makes
> > sense.
> I don't know why WARNING was chosen.  I think it should be FATAL,
> resulting in the standby shutdown, disabling restarting it, and urging the user
> to rebuild the standby.  (I guess that's overreaction because the user may
> not perform operations that lack WAL while wal_level is minimal.)
Yeah, I agree that WARNING is not sufficient.


> > The second idea is incremental counter that indicates drop of
> > wal_level from replica to minimal (or like from minimal to none).
> > Its purpose is to compare the wal_level changes between snapshots.
> > When any monitoring tools detect any difference of the counter, we can
> > predict something happened immediately without checking WAL in
> > between.
>
> Let's depict the situation.  I may be misunderstanding, so any correction
> would be much welcome.  Here, I call the new field
> wal_level_change_counter, which should be changed to a proper name.
>
> 1. wal_level = replica.  Take a base backup and store it in
> $BACKUPDIR/20201119/.
>   wal_level_change_counter = 0
>
> 2. Set wal_level = minimal or none, and restart the instance.  Perform some
> operations.
>   wal_level_change_counter = 1
>
> 3. Set wal_level = replica, and restart the instance.
>   wal_level_change_counter = 1
>
> 4. Some monitoring system compares the values of
> wal_level_change_counter in $BACKUPDIR/20201119/ and $PGDATA/, and
> notices the difference (0 and 1 respectively.)
> It warns the user that he/she should take a full backup because some WAL
> may be missing to recover the latest data from the last backup in
> $BACKUPDIR/20201119/.
My understanding is completely same as the description Tsunakawa-San wrote above.

> But I think this is a separate patch, because the issue already applies to
> wal_level = minimal.
OK. The range of this patch's responsibility was obscure, but now I'm fine.
I understand that I don't need to incorporate those mechanisms in the new wal_level patch.


Best,
    Takamichi Osumi



Re: Disable WAL logging to speed up data loading

From
Laurenz Albe
Date:
On Thu, 2020-11-19 at 05:24 +0000, osumi.takamichi@fujitsu.com wrote:
> > > > ereport(WARNING,
> > > >          (errmsg("WAL was generated with wal_level=minimal, data may
> > > > be missing"),
> > > >          errhint("This happens if you temporarily set
> > > > wal_level=minimal without taking a new base backup.")));
> > > > There's definitely a question about if a WARNING there is really
> > > > sufficient or not, considering that you could end up with 'logged'
> > > > tables on the replica that are missing data, but I'm not sure that
> > > > inventing a new, independent, mechanism for checking WAL level
> > > > changes makes
> > > sense.
> >
> > I don't know why WARNING was chosen.  I think it should be FATAL,
> > resulting in the standby shutdown, disabling restarting it, and urging the user
> > to rebuild the standby.  (I guess that's overreaction because the user may
> > not perform operations that lack WAL while wal_level is minimal.)
> 
> Yeah, I agree that WARNING is not sufficient.

I missed that this is only a warning when I looked at it before.
Yes, it should be a fatal error.

I think that there should two patches: one that turns this warning into
a FATAL and should be backpatched.  If you change the test to

   ControlFile->wal_level <= WAL_LEVEL_MINIMAL

it will automatically work for your new feature too.

Then your new wal_level would be a second patch only for HEAD.

With that, the only remaining consideration with this patch is the danger
that enabling wal_level=none without taking a backup before can lead to
data loss.  But that is intended, so I think that an unmistakable warning
in the documentation would be good enough.

Yours,
Laurenz Albe




RE: Disable WAL logging to speed up data loading

From
"tsunakawa.takay@fujitsu.com"
Date:
(1)
 #define RelationNeedsWAL(relation)                                        \
+    (wal_level != WAL_LEVEL_NONE &&                                        \
     ((relation)->rd_rel->relpersistence == RELPERSISTENCE_PERMANENT &&    \
      (XLogIsNeeded() ||                                                    \
       (relation->rd_createSubid == InvalidSubTransactionId &&            \
-       relation->rd_firstRelfilenodeSubid == InvalidSubTransactionId)))
+       relation->rd_firstRelfilenodeSubid == InvalidSubTransactionId))))

You can remove one pair of parentheses for readability as follows:

 #define RelationNeedsWAL(relation)                                        \
+    (wal_level != WAL_LEVEL_NONE &&                                        \
     (relation)->rd_rel->relpersistence == RELPERSISTENCE_PERMANENT &&    \
      (XLogIsNeeded() ||                                                    \
       (relation->rd_createSubid == InvalidSubTransactionId &&            \
-       relation->rd_firstRelfilenodeSubid == InvalidSubTransactionId)))
+       relation->rd_firstRelfilenodeSubid == InvalidSubTransactionId)))


(2)
      /*
+     * Detect if the server previously crashed under wal_level='none' or not.
+     */
+    if (ControlFile->wal_level == WAL_LEVEL_NONE &&
+        (ControlFile->state != DB_SHUTDOWNED && ControlFile->state != DB_SHUTDOWNED_IN_RECOVERY))
+        ereport(ERROR,
+                (errmsg("detected an unexpected server shutdown when WAL logging was disabled"),
+                 errhint("It looks like you need to deploy a new cluster from your full backup again.")));
+
+    /*
      * Check that contents look valid.
      */
     if (!XRecOffIsValid(ControlFile->checkPoint))

I think the above safeguard should be placed right before the following code, because it's better to first check
controlfile contents and emit the message for the previous shutdown state.
 

#ifdef XLOG_REPLAY_DELAY
    if (ControlFile->state != DB_SHUTDOWNED)
        pg_usleep(60000000L);
#endif


(3)
@@ -449,6 +449,13 @@ XLogInsert(RmgrId rmid, uint8 info)
         return EndPos;
     }
 
+    /* Issues WAL only for XLOG resources */
+    if (wal_level == WAL_LEVEL_NONE && rmid != RM_XLOG_ID)
+    {
+        XLogResetInsertion();
+        return GetLatestCheckPointLSN();
+    }

It should be correct to use GetXLogInsertRecPtr() instead of GetLatestCheckPointLSN(), because what XLogInsert() should
returnis the end position of last WAL record inserted (= start position of next WAL).
 

Why don't we add transaction completion WAL records?  That is, add "rmid != RM_XACT_ID" to the if condition.  What we
reallywant to eliminate for performance is the data modification WAL records.  This might allow us to insert special
checksto prevent PREPARE TRANSACTION and other stuff.
 


(4)
@@ -404,6 +404,10 @@ pg_logical_emit_message_bytea(PG_FUNCTION_ARGS)
     bytea       *data = PG_GETARG_BYTEA_PP(2);
     XLogRecPtr    lsn;
 
+    if (wal_level < WAL_LEVEL_REPLICA)
+        ereport(ERROR,
+                errmsg("propagating a message requires wal_level \"replica\" or \"logical\""));
+
     lsn = LogLogicalMessage(prefix, VARDATA_ANY(data), VARSIZE_ANY_EXHDR(data),
                             transactional);
     PG_RETURN_LSN(lsn);

@@ -192,6 +192,10 @@ replorigin_check_prerequisites(bool check_slots, bool recoveryOK)
                 (errcode(ERRCODE_READ_ONLY_SQL_TRANSACTION),
                  errmsg("cannot manipulate replication origins during recovery")));
 
+    if (wal_level < WAL_LEVEL_REPLICA)
+        ereport(ERROR,
+                errmsg("creating replication origins requires wal_level \"replica\" or \"logical\""));
+
 }

@@ -625,6 +625,9 @@ standard_ProcessUtility(PlannedStmt *pstmt,
                         break;
 
                     case TRANS_STMT_PREPARE:
+                        if (wal_level == WAL_LEVEL_NONE)
+                            ereport(ERROR,
+                                    errmsg("cannot execute PREPARE TRANSACTION when WAL logging is disabled"));

What happens without these checks?



Regards
Takayuki Tsunakawa


RE: Disable WAL logging to speed up data loading

From
"osumi.takamichi@fujitsu.com"
Date:
Hello, Laurenz



On Thursday, Nov19, 2020 4:50 PM Laurenz Albe <laurenz.albe@cybertec.at> wrote
> On Thu, 2020-11-19 at 05:24 +0000, osumi.takamichi@fujitsu.com wrote:
> > > > > ereport(WARNING,
> > > > >          (errmsg("WAL was generated with wal_level=minimal, data
> > > > > may be missing"),
> > > > >          errhint("This happens if you temporarily set
> > > > > wal_level=minimal without taking a new base backup."))); There's
> > > > > definitely a question about if a WARNING there is really
> > > > > sufficient or not, considering that you could end up with 'logged'
> > > > > tables on the replica that are missing data, but I'm not sure
> > > > > that inventing a new, independent, mechanism for checking WAL
> > > > > level changes makes
> > > > sense.
> > >
> > > I don't know why WARNING was chosen.  I think it should be FATAL,
> > > resulting in the standby shutdown, disabling restarting it, and
> > > urging the user to rebuild the standby.  (I guess that's
> > > overreaction because the user may not perform operations that lack
> > > WAL while wal_level is minimal.)
> >
> > Yeah, I agree that WARNING is not sufficient.
> 
> I missed that this is only a warning when I looked at it before.
> Yes, it should be a fatal error.
> 
> I think that there should two patches: one that turns this warning into a
> FATAL and should be backpatched.  If you change the test to
> 
>    ControlFile->wal_level <= WAL_LEVEL_MINIMAL
> 
> it will automatically work for your new feature too.
> Then your new wal_level would be a second patch only for HEAD.
Yeah, this suggestion to divide the patch into two sounds really good.
Thank you. I'll post separated patches next time.

> 
> With that, the only remaining consideration with this patch is the danger that
> enabling wal_level=none without taking a backup before can lead to data loss.
> But that is intended, so I think that an unmistakable warning in the
> documentation would be good enough.
Yes, thank you for reviewing the documents in the patch.


Best,
    Takamichi Osumi

Re: Disable WAL logging to speed up data loading

From
Stephen Frost
Date:
Greetings,

* Laurenz Albe (laurenz.albe@cybertec.at) wrote:
> On Thu, 2020-11-19 at 05:24 +0000, osumi.takamichi@fujitsu.com wrote:
> > > > > ereport(WARNING,
> > > > >          (errmsg("WAL was generated with wal_level=minimal, data may
> > > > > be missing"),
> > > > >          errhint("This happens if you temporarily set
> > > > > wal_level=minimal without taking a new base backup.")));
> > > > > There's definitely a question about if a WARNING there is really
> > > > > sufficient or not, considering that you could end up with 'logged'
> > > > > tables on the replica that are missing data, but I'm not sure that
> > > > > inventing a new, independent, mechanism for checking WAL level
> > > > > changes makes
> > > > sense.
> > >
> > > I don't know why WARNING was chosen.  I think it should be FATAL,
> > > resulting in the standby shutdown, disabling restarting it, and urging the user
> > > to rebuild the standby.  (I guess that's overreaction because the user may
> > > not perform operations that lack WAL while wal_level is minimal.)
> >
> > Yeah, I agree that WARNING is not sufficient.
>
> I missed that this is only a warning when I looked at it before.
> Yes, it should be a fatal error.

Yeah, the more that I think about it, the more that I tend to agree with
this.  Does anyone want to argue against changing this into a FATAL..?

Thanks,

Stephen

Attachment

Re: Disable WAL logging to speed up data loading

From
Kyotaro Horiguchi
Date:
At Thu, 19 Nov 2020 11:04:17 -0500, Stephen Frost <sfrost@snowman.net> wrote in 
> Greetings,
> 
> * Laurenz Albe (laurenz.albe@cybertec.at) wrote:
> > On Thu, 2020-11-19 at 05:24 +0000, osumi.takamichi@fujitsu.com wrote:
> > > > > > ereport(WARNING,
> > > > > >          (errmsg("WAL was generated with wal_level=minimal, data may
> > > > > > be missing"),
> > > > > >          errhint("This happens if you temporarily set
> > > > > > wal_level=minimal without taking a new base backup.")));
> > > > > > There's definitely a question about if a WARNING there is really
> > > > > > sufficient or not, considering that you could end up with 'logged'
> > > > > > tables on the replica that are missing data, but I'm not sure that
> > > > > > inventing a new, independent, mechanism for checking WAL level
> > > > > > changes makes
> > > > > sense.
> > > >
> > > > I don't know why WARNING was chosen.  I think it should be FATAL,
> > > > resulting in the standby shutdown, disabling restarting it, and urging the user
> > > > to rebuild the standby.  (I guess that's overreaction because the user may
> > > > not perform operations that lack WAL while wal_level is minimal.)
> > > 
> > > Yeah, I agree that WARNING is not sufficient.
> > 
> > I missed that this is only a warning when I looked at it before.
> > Yes, it should be a fatal error.
> 
> Yeah, the more that I think about it, the more that I tend to agree with
> this.  Does anyone want to argue against changing this into a FATAL..?

I don't come up with a use case where someone needs to set
wal_level=minimal for archive recovery. So +1 to change it to FATAL.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



RE: Disable WAL logging to speed up data loading

From
"tsunakawa.takay@fujitsu.com"
Date:
From: Kyotaro Horiguchi <horikyota.ntt@gmail.com>
> At Thu, 19 Nov 2020 11:04:17 -0500, Stephen Frost <sfrost@snowman.net>
> > * Laurenz Albe (laurenz.albe@cybertec.at) wrote:
> > > I missed that this is only a warning when I looked at it before.
> > > Yes, it should be a fatal error.
> >
> > Yeah, the more that I think about it, the more that I tend to agree with
> > this.  Does anyone want to argue against changing this into a FATAL..?
>
> I don't come up with a use case where someone needs to set
> wal_level=minimal for archive recovery. So +1 to change it to FATAL.

Thank you all.  I'd like to give Osumi-san an opportunity to write a patch and showing the evidence of successful test
ifyou don't mind.  He is very young and newbie to Postgres, so even a small contribution would encourage him to make
furthercontributions. 


Regards
Takayuki Tsunakawa





RE: Disable WAL logging to speed up data loading

From
"osumi.takamichi@fujitsu.com"
Date:
Hello


On Friday, Nov 20, 2020 9:33 AM Tsunakawa, Takayuki <tsunakawa.takay@fujitsu.com> wrote:
> From: Kyotaro Horiguchi <horikyota.ntt@gmail.com>
> > At Thu, 19 Nov 2020 11:04:17 -0500, Stephen Frost <sfrost@snowman.net>
> > > * Laurenz Albe (laurenz.albe@cybertec.at) wrote:
> > > > I missed that this is only a warning when I looked at it before.
> > > > Yes, it should be a fatal error.
> > >
> > > Yeah, the more that I think about it, the more that I tend to agree
> > > with this.  Does anyone want to argue against changing this into a
> FATAL..?
> >
> > I don't come up with a use case where someone needs to set
> > wal_level=minimal for archive recovery. So +1 to change it to FATAL.
It seems we reached the agreement for this fix.

> Thank you all.  I'd like to give Osumi-san an opportunity to write a patch and
> showing the evidence of successful test if you don't mind.  He is very young
> and newbie to Postgres, so even a small contribution would encourage him to
> make further contributions.
Sure. Let me do that, including the test.

Best,
    Takamichi Osumi




RE: Disable WAL logging to speed up data loading

From
"osumi.takamichi@fujitsu.com"
Date:
Hi

This time, I updated my patch to address comments below only.
Please have a look at updated one.

On Thursday, Nov 19, 2020 4:51 PM Tsunakawa, Takayuki <tsunakawa.takay@fujitsu.com>
> (1)
>  #define RelationNeedsWAL(relation)
>                     \
> +    (wal_level != WAL_LEVEL_NONE &&
>                         \
>      ((relation)->rd_rel->relpersistence ==
> RELPERSISTENCE_PERMANENT &&    \
>       (XLogIsNeeded() ||
>                             \
>        (relation->rd_createSubid == InvalidSubTransactionId &&
>         \
> -       relation->rd_firstRelfilenodeSubid == InvalidSubTransactionId)))
> +       relation->rd_firstRelfilenodeSubid ==
> InvalidSubTransactionId))))
> 
> You can remove one pair of parentheses for readability as follows:
> 
>  #define RelationNeedsWAL(relation)
>                     \
> +    (wal_level != WAL_LEVEL_NONE &&
>                         \
>      (relation)->rd_rel->relpersistence ==
> RELPERSISTENCE_PERMANENT &&    \
>       (XLogIsNeeded() ||
>                             \
>        (relation->rd_createSubid == InvalidSubTransactionId &&
>         \
> -       relation->rd_firstRelfilenodeSubid == InvalidSubTransactionId)))
> +       relation->rd_firstRelfilenodeSubid == InvalidSubTransactionId)))
Done. 

> (2)
>       /*
> +     * Detect if the server previously crashed under wal_level='none' or
> not.
> +     */
> +    if (ControlFile->wal_level == WAL_LEVEL_NONE &&
> +        (ControlFile->state != DB_SHUTDOWNED &&
> ControlFile->state != DB_SHUTDOWNED_IN_RECOVERY))
> +        ereport(ERROR,
> +                (errmsg("detected an unexpected server
> shutdown when WAL logging was disabled"),
> +                 errhint("It looks like you need to deploy a
> new cluster from your full backup again.")));
> +
> +    /*
>       * Check that contents look valid.
>       */
>      if (!XRecOffIsValid(ControlFile->checkPoint))
> 
> I think the above safeguard should be placed right before the following code,
> because it's better to first check control file contents and emit the message
> for the previous shutdown state.
> 
> #ifdef XLOG_REPLAY_DELAY
>     if (ControlFile->state != DB_SHUTDOWNED)
>         pg_usleep(60000000L);
> #endif
Thank you. Perfectly makes sense. Fixed.


> (3)
> @@ -449,6 +449,13 @@ XLogInsert(RmgrId rmid, uint8 info)
>          return EndPos;
>      }
> 
> +    /* Issues WAL only for XLOG resources */
> +    if (wal_level == WAL_LEVEL_NONE && rmid != RM_XLOG_ID)
> +    {
> +        XLogResetInsertion();
> +        return GetLatestCheckPointLSN();
> +    }
> 
> It should be correct to use GetXLogInsertRecPtr() instead of
> GetLatestCheckPointLSN(), because what XLogInsert() should return is the
> end position of last WAL record inserted (= start position of next WAL).
Ah, sorry. My patch was not correct.


> Why don't we add transaction completion WAL records?  That is, add
> "rmid != RM_XACT_ID" to the if condition.  What we really want to eliminate
> for performance is the data modification WAL records.  This might allow us
> to insert special checks to prevent PREPARE TRANSACTION and other stuff.
I apologize that you mentioned this point in your past review but I took it wrongly.
I fixed the condition and the comment for it.


> What happens without these checks? 
> (4)
> @@ -404,6 +404,10 @@
> pg_logical_emit_message_bytea(PG_FUNCTION_ARGS)
>      bytea       *data = PG_GETARG_BYTEA_PP(2);
>      XLogRecPtr    lsn;
> 
> +    if (wal_level < WAL_LEVEL_REPLICA)
> +        ereport(ERROR,
> +                errmsg("propagating a message requires
> wal_level \"replica\" or \"logical\""));
> +
>      lsn = LogLogicalMessage(prefix, VARDATA_ANY(data),
> VARSIZE_ANY_EXHDR(data),
>                              transactional);
>      PG_RETURN_LSN(lsn);
> 
> @@ -192,6 +192,10 @@ replorigin_check_prerequisites(bool check_slots,
> bool recoveryOK)
> 
>     (errcode(ERRCODE_READ_ONLY_SQL_TRANSACTION),
>                   errmsg("cannot manipulate replication
> origins during recovery")));
> 
> +    if (wal_level < WAL_LEVEL_REPLICA)
> +        ereport(ERROR,
> +                errmsg("creating replication origins requires
> wal_level \"replica\" or \"logical\""));
> +
>  }
Those functions doesn't do something meaningful when wal_level <= minimal.
For example, pg_logical_emit_message_bytea calls XLogInsert with RM_LOGICALMSG_ID rmid.
This doesn't generate WAL during wal_level=none,
which makes the usage of the function nonsense.

But, the latest patch I attached removed this part of modification,
because this can be applied to wal_level = minimal also and
I should make the patch focus on its purpose only.


> @@ -625,6 +625,9 @@ standard_ProcessUtility(PlannedStmt *pstmt,
>                          break;
> 
>                      case TRANS_STMT_PREPARE:
> +                        if (wal_level ==
> WAL_LEVEL_NONE)
> +                            ereport(ERROR,
> +
>     errmsg("cannot execute PREPARE TRANSACTION when WAL
> logging is disabled"));
Usually, this safeguard is not needed for PREPARE TRANSACTION.
But, when an unexpected crash happens, user needs
to recreate the cluster from the backup and
execute the operations that are executed into the crashed server again
if the user sets wal_level=none.

While 2PC guarantees that after issuing PREPARE TRANSACTION,
the processing or the contents of the transaction becomes *secure*,
setting wal_level=none makes the server never start up again if a database crash occurs.
In other words, this safeguard of the new wal_level damages
the important aspect of PREPARE TRANSACTION's functionality, in my opinion.

I don't think this is what it should be.

Therefore, I don't want users to utilize the PREPARE TRANSACTION still
in order to prevent that user thinks that they can use PREPARE TRANSACTION
safely during wal_level=none and execute it.
Did it make sense ?

Lastly, I fixed the documentation related to the types of generated WAL as well.

Best,
    Takamichi Osumi

Attachment

Re: Disable WAL logging to speed up data loading

From
Stephen Frost
Date:
Greetings,

* osumi.takamichi@fujitsu.com (osumi.takamichi@fujitsu.com) wrote:
> On Friday, Nov 20, 2020 9:33 AM Tsunakawa, Takayuki <tsunakawa.takay@fujitsu.com> wrote:
> > From: Kyotaro Horiguchi <horikyota.ntt@gmail.com>
> > > At Thu, 19 Nov 2020 11:04:17 -0500, Stephen Frost <sfrost@snowman.net>
> > > > * Laurenz Albe (laurenz.albe@cybertec.at) wrote:
> > > > > I missed that this is only a warning when I looked at it before.
> > > > > Yes, it should be a fatal error.
> > > >
> > > > Yeah, the more that I think about it, the more that I tend to agree
> > > > with this.  Does anyone want to argue against changing this into a
> > FATAL..?
> > >
> > > I don't come up with a use case where someone needs to set
> > > wal_level=minimal for archive recovery. So +1 to change it to FATAL.
> It seems we reached the agreement for this fix.
>
> > Thank you all.  I'd like to give Osumi-san an opportunity to write a patch and
> > showing the evidence of successful test if you don't mind.  He is very young
> > and newbie to Postgres, so even a small contribution would encourage him to
> > make further contributions.
> Sure. Let me do that, including the test.

That all sounds good to me.  I would recommend starting a new thread on
-hackers with the patch, once it's ready, since it's really independent
from the topic of this thread.

Thanks,

Stephen

Attachment

RE: Disable WAL logging to speed up data loading

From
"tsunakawa.takay@fujitsu.com"
Date:
From: Osumi, Takamichi/大墨 昂道 <osumi.takamichi@fujitsu.com>
> >                      case TRANS_STMT_PREPARE:
> > +                        if (wal_level ==
> > WAL_LEVEL_NONE)
> > +                            ereport(ERROR,
> > +
> >     errmsg("cannot execute PREPARE TRANSACTION when WAL logging
> is
> > disabled"));
> Usually, this safeguard is not needed for PREPARE TRANSACTION.
> But, when an unexpected crash happens, user needs to recreate the cluster
> from the backup and execute the operations that are executed into the crashed
> server again if the user sets wal_level=none.
> 
> While 2PC guarantees that after issuing PREPARE TRANSACTION, the
> processing or the contents of the transaction becomes *secure*, setting
> wal_level=none makes the server never start up again if a database crash
> occurs.
> In other words, this safeguard of the new wal_level damages the important
> aspect of PREPARE TRANSACTION's functionality, in my opinion.
> 
> I don't think this is what it should be.
> 
> Therefore, I don't want users to utilize the PREPARE TRANSACTION still in
> order to prevent that user thinks that they can use PREPARE TRANSACTION
> safely during wal_level=none and execute it.
> Did it make sense ?

PREPARE TRANSACTION is the same as COMMIT in that it persists transaction updates.  A crash during wal_level = none
losesboth of them.  So, I don't think PREPARE TRANSACTION needs special treatment.
 

Does PREPARE TRANSACTION complete successfully?  I remember Fujii-san said it PANICed if --enable-asserts is used in
configure.


Regards
Takayuki Tsunakawa




RE: Disable WAL logging to speed up data loading

From
"tsunakawa.takay@fujitsu.com"
Date:
From: Osumi, Takamichi/大墨 昂道 <osumi.takamichi@fujitsu.com>
> This time, I updated my patch to address comments below only.

I forgot to mentionthis.  I confirmed all review comments are reflected correctly.


Regards
Takayuki Tsunakawa


RE: Disable WAL logging to speed up data loading

From
"osumi.takamichi@fujitsu.com"
Date:
Hi

On Monday, Nov 23, 2020 12:17 PM Tsunakawa, Takayuki <tsunakawa.takay@fujitsu.com> wrote:
> PREPARE TRANSACTION is the same as COMMIT in that it persists
> transaction updates.  A crash during wal_level = none loses both of them.
> So, I don't think PREPARE TRANSACTION needs special treatment.
Yeah, I got it. That makes sense.
Attached is the one I removed the special treatment.


> Does PREPARE TRANSACTION complete successfully?  I remember
> Fujii-san said it PANICed if --enable-asserts is used in configure.
Yes. That assertion failure Fujii-San reported in the past
was solved by having rmid != RM_XACT_ID in XLogInsert()
to add WAL generation for transaction completes.
That I missed the condition was the cause but fine now.

Plus, PREPARE TRANSACTION and COMMIT PREPARED
works successfully at present when no happening occurs.

Best,
    Takamichi Osumi


Attachment

Re: Disable WAL logging to speed up data loading

From
Masahiko Sawada
Date:
On Tue, Nov 24, 2020 at 11:19 AM osumi.takamichi@fujitsu.com
<osumi.takamichi@fujitsu.com> wrote:
>
> Hi
>
> On Monday, Nov 23, 2020 12:17 PM Tsunakawa, Takayuki <tsunakawa.takay@fujitsu.com> wrote:
> > PREPARE TRANSACTION is the same as COMMIT in that it persists
> > transaction updates.  A crash during wal_level = none loses both of them.
> > So, I don't think PREPARE TRANSACTION needs special treatment.
> Yeah, I got it. That makes sense.
> Attached is the one I removed the special treatment.
>
>
> > Does PREPARE TRANSACTION complete successfully?  I remember
> > Fujii-san said it PANICed if --enable-asserts is used in configure.
> Yes. That assertion failure Fujii-San reported in the past
> was solved by having rmid != RM_XACT_ID in XLogInsert()
> to add WAL generation for transaction completes.
> That I missed the condition was the cause but fine now.
>
> Plus, PREPARE TRANSACTION and COMMIT PREPARED
> works successfully at present when no happening occurs.
>

Thank you for updating the patch.

-               (errmsg("WAL was generated with wal_level=minimal,
data may be missing"),
+               (errmsg("WAL was generated with wal_level<=minimal,
data may be missing"),
                 errhint("This happens if you temporarily set
wal_level=minimal without taking a new base backup.")));

'wal_level=minimal' in errhint also needs to be changed to 'wal_level<=minimal'?

While testing the patch on some workload, I realized that
XLOG_FPI_FOR_HINT record could still be emitted even when wal_level =
none. IIUC that WAL record is not necessary during wal_level = none
since the server cannot be the primary server and the server crash
ends up requiring to restore the whole database.

Regards,

--
Masahiko Sawada
EnterpriseDB:  https://www.enterprisedb.com/



Re: Disable WAL logging to speed up data loading

From
Kyotaro Horiguchi
Date:
At Fri, 27 Nov 2020 15:07:34 +0900, Masahiko Sawada <sawada.mshk@gmail.com> wrote in 
> On Tue, Nov 24, 2020 at 11:19 AM osumi.takamichi@fujitsu.com
> <osumi.takamichi@fujitsu.com> wrote:
> >
> > Hi
> >
> > On Monday, Nov 23, 2020 12:17 PM Tsunakawa, Takayuki <tsunakawa.takay@fujitsu.com> wrote:
> > > PREPARE TRANSACTION is the same as COMMIT in that it persists
> > > transaction updates.  A crash during wal_level = none loses both of them.
> > > So, I don't think PREPARE TRANSACTION needs special treatment.
> > Yeah, I got it. That makes sense.
> > Attached is the one I removed the special treatment.
> >
> >
> > > Does PREPARE TRANSACTION complete successfully?  I remember
> > > Fujii-san said it PANICed if --enable-asserts is used in configure.
> > Yes. That assertion failure Fujii-San reported in the past
> > was solved by having rmid != RM_XACT_ID in XLogInsert()
> > to add WAL generation for transaction completes.
> > That I missed the condition was the cause but fine now.
> >
> > Plus, PREPARE TRANSACTION and COMMIT PREPARED
> > works successfully at present when no happening occurs.
> >
> 
> Thank you for updating the patch.
> 
> -               (errmsg("WAL was generated with wal_level=minimal,
> data may be missing"),
> +               (errmsg("WAL was generated with wal_level<=minimal,
> data may be missing"),
>                  errhint("This happens if you temporarily set
> wal_level=minimal without taking a new base backup.")));
> 
> 'wal_level=minimal' in errhint also needs to be changed to 'wal_level<=minimal'?
> 
> While testing the patch on some workload, I realized that
> XLOG_FPI_FOR_HINT record could still be emitted even when wal_level =
> none. IIUC that WAL record is not necessary during wal_level = none
> since the server cannot be the primary server and the server crash
> ends up requiring to restore the whole database.

It seems to be on purpose.

@@ -449,6 +449,14 @@ XLogInsert(RmgrId rmid, uint8 info)
         return EndPos;
     }
 
+    /* Issues WAL related to XLOG resources and transactions only */
+    if (wal_level == WAL_LEVEL_NONE &&
+        rmid != RM_XLOG_ID && rmid != RM_XACT_ID)
+    {
+        XLogResetInsertion();
+        return GetXLogInsertRecPtr();

What is the reason for those kinds of records to be emitted?

I think we must emit at least the shutdown checkpoint record, which
has redo-LSN that points to the record itself.  I'm not sure what
other kinds of records are needed.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



RE: Disable WAL logging to speed up data loading

From
"tsunakawa.takay@fujitsu.com"
Date:
From: Masahiko Sawada <sawada.mshk@gmail.com>
> While testing the patch on some workload, I realized that
> XLOG_FPI_FOR_HINT record could still be emitted even when wal_level =
> none. IIUC that WAL record is not necessary during wal_level = none
> since the server cannot be the primary server and the server crash
> ends up requiring to restore the whole database.

Nice catch!  XLOG_FPI_FOR_HINT and XLOG_FPI should be eliminated, otherwise large amount of WAL may be written.  (It
seemsthat other RMIDs than RM_XLOG_ID and RM_XACT_ID do not have to be written.)
 

I'm afraid "none" doesn't represent the behavior because RM_XLOG_ID and RM_XACT_ID WAL records, except for XLOG_FPI_*,
areemitted.  What's the good name?  IIUC, "minimal" is named after the fact that the minimal amount of WAL necessary
forcrash recovery is generated.  "norecovery" or "unrecoverable"?
 

Regards
Takayuki Tsunakawa


Re: Disable WAL logging to speed up data loading

From
Kyotaro Horiguchi
Date:
At Fri, 27 Nov 2020 07:01:16 +0000, "tsunakawa.takay@fujitsu.com" <tsunakawa.takay@fujitsu.com> wrote in 
> From: Masahiko Sawada <sawada.mshk@gmail.com>
> > While testing the patch on some workload, I realized that
> > XLOG_FPI_FOR_HINT record could still be emitted even when wal_level =
> > none. IIUC that WAL record is not necessary during wal_level = none
> > since the server cannot be the primary server and the server crash
> > ends up requiring to restore the whole database.
> 
> Nice catch!  XLOG_FPI_FOR_HINT and XLOG_FPI should be eliminated, otherwise large amount of WAL may be written.  (It
seemsthat other RMIDs than RM_XLOG_ID and RM_XACT_ID do not have to be written.)
 
> 
> I'm afraid "none" doesn't represent the behavior because RM_XLOG_ID and RM_XACT_ID WAL records, except for
XLOG_FPI_*,are emitted.  What's the good name?  IIUC, "minimal" is named after the fact that the minimal amount of WAL
necessaryfor crash recovery is generated.  "norecovery" or "unrecoverable"?
 

I haven't seen a criteria of whether a record is emitted or not for
wal_leve=none.

We're emitting only redo logs.  So I think theoretically we don't need
anything other than the shutdown checkpoint record because we don't
perform recovery and checkpoint record is required at startup.

RM_XLOG_ID:
 XLOG_FPI_FOR_HINT  - not needed?
 XLOG_FPI           - not needed?

 XLOG_CHECKPOINT_SHUTDOWN - must have

So how about the followings?
 XLOG_CHECKPOINT_ONLINE
 XLOG_NOOP
 XLOG_NEXTOID
 XLOG_SWITCH
 XLOG_BACKUP_END
 XLOG_PARAMETER_CHANGE
 XLOG_RESTORE_POINT
 XLOG_FPW_CHANGE
 XLOG_END_OF_RECOVERY


RM_XACT_ID:
 XLOG_XACT_COMMIT
 XLOG_XACT_PREPARE
 XLOG_XACT_ABORT
 XLOG_XACT_COMMIT_PREPARED
 XLOG_XACT_ABORT_PREPARED
 XLOG_XACT_ASSIGNMENT
 XLOG_XACT_INVALIDATIONS

Do we need all of these?

And, currenly what decides whether to emit a wal record according to
wal_level is the caller of XLogInsert. So doing this at
XLogInsert-level means that we bring the criteria of the necessity of
wal-record into xlog layer only for wal_level=none. I'm not sure it is
the right direction.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



RE: Disable WAL logging to speed up data loading

From
"osumi.takamichi@fujitsu.com"
Date:
Hello, Sawada-San

On Friday, November 27, 2020 3:08 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> -               (errmsg("WAL was generated with wal_level=minimal,
> data may be missing"),
> +               (errmsg("WAL was generated with wal_level<=minimal,
> data may be missing"),
>                  errhint("This happens if you temporarily set
> wal_level=minimal without taking a new base backup.")));
> 
> 'wal_level=minimal' in errhint also needs to be changed to
> 'wal_level<=minimal'?
Yeah, thanks. I'll fix this point in the next patch.


> While testing the patch on some workload, I realized that
> XLOG_FPI_FOR_HINT record could still be emitted even when wal_level =
> none. IIUC that WAL record is not necessary during wal_level = none since
> the server cannot be the primary server and the server crash ends up requiring
> to restore the whole database.
That's right. Yeah, I'm aware of the fact that
we can refine the types of WAL. Basically, I thought the amount of 
WALs related to XLOG and XACT should be less than that of other types and
generating those doesn't have a serious impact on the peformance.
But anyway, thanks for your advice !


Best,
    Takamichi Osumi


RE: Disable WAL logging to speed up data loading

From
"osumi.takamichi@fujitsu.com"
Date:
Hi, Tsunakawa-San


> I'm afraid "none" doesn't represent the behavior because RM_XLOG_ID and
> RM_XACT_ID WAL records, except for XLOG_FPI_*, are emitted.  What's the
> good name?  IIUC, "minimal" is named after the fact that the minimal
> amount of WAL necessary for crash recovery is generated.  "norecovery" or
> "unrecoverable"?
Really good point !

Yes, it cannot be helped to generate tiny grain of WALs.
I prefer "unrecoverable" to "none",
which should work to make any users utilize this feature really really carefully.
If there's no strong objection, I'll change the name to it from next time.

Best,
    Takamichi Osumi


RE: Disable WAL logging to speed up data loading

From
"osumi.takamichi@fujitsu.com"
Date:
Thank you, Horiguchi-San

> I haven't seen a criteria of whether a record is emitted or not for
> wal_leve=none.
>
> We're emitting only redo logs.  So I think theoretically we don't need anything
> other than the shutdown checkpoint record because we don't perform
> recovery and checkpoint record is required at startup.
>
> RM_XLOG_ID:
>  XLOG_FPI_FOR_HINT  - not needed?
>  XLOG_FPI           - not needed?
>
>  XLOG_CHECKPOINT_SHUTDOWN - must have
>
> So how about the followings?
>  XLOG_CHECKPOINT_ONLINE
>  XLOG_NOOP
>  XLOG_NEXTOID
>  XLOG_SWITCH
>  XLOG_BACKUP_END
>  XLOG_PARAMETER_CHANGE
>  XLOG_RESTORE_POINT
>  XLOG_FPW_CHANGE
>  XLOG_END_OF_RECOVERY
>
>
> RM_XACT_ID:
>  XLOG_XACT_COMMIT
>  XLOG_XACT_PREPARE
>  XLOG_XACT_ABORT
>  XLOG_XACT_COMMIT_PREPARED
>  XLOG_XACT_ABORT_PREPARED
>  XLOG_XACT_ASSIGNMENT
>  XLOG_XACT_INVALIDATIONS
>
> Do we need all of these?
No. Strictly speaking, you are right.
We still have types of WAL that are not necessarily needed.
For example, XLOG_END_OF_RECOVERY is not useful
because wal_level=none doesn't recover from any accidents.
Or, XLOG_CHECKPOINT_ONLINE is used when we execute CHECKPOINT
not for shutting down. Thus we could eliminate more.

> And, currenly what decides whether to emit a wal record according to
> wal_level is the caller of XLogInsert.
Yes.

> So doing this at XLogInsert-level means
> that we bring the criteria of the necessity of wal-record into xlog layer only for
> wal_level=none. I'm not sure it is the right direction.
I'm sorry. I didn't understand what "doing this" and "xlog layer" meant.
Did you mean that fixing the caller side of XLogInsert (e.g. CreateCheckPoint)
is not the right direction ? Or, fixing the function of XLogInsert is not the right direction ?


> At Fri, 27 Nov 2020 07:01:16 +0000, "tsunakawa.takay@fujitsu.com"
> <tsunakawa.takay@fujitsu.com> wrote in
> > I'm afraid "none" doesn't represent the behavior because RM_XLOG_ID and
> RM_XACT_ID WAL records, except for XLOG_FPI_*, are emitted.  What's the
> good name?  IIUC, "minimal" is named after the fact that the minimal
> amount of WAL necessary for crash recovery is generated.  "norecovery" or
> "unrecoverable"?
Lastly, I found another name which expresses the essential characteristic of this wal_level.
How about the name of wal_level="crash_unsafe" ?
What did you think ?


Best,
    Takamichi Osumi




Re: Disable WAL logging to speed up data loading

From
Kyotaro Horiguchi
Date:
At Fri, 27 Nov 2020 13:34:49 +0000, "osumi.takamichi@fujitsu.com" <osumi.takamichi@fujitsu.com> wrote in 
> Thank you, Horiguchi-San
> 
> > I haven't seen a criteria of whether a record is emitted or not for
> > wal_leve=none.
> > 
> > We're emitting only redo logs.  So I think theoretically we don't need anything
> > other than the shutdown checkpoint record because we don't perform
> > recovery and checkpoint record is required at startup.
> > 
> > RM_XLOG_ID:
> >  XLOG_FPI_FOR_HINT  - not needed?
> >  XLOG_FPI           - not needed?
> > 
> >  XLOG_CHECKPOINT_SHUTDOWN - must have
> > 
> > So how about the followings?
> >  XLOG_CHECKPOINT_ONLINE
> >  XLOG_NOOP
> >  XLOG_NEXTOID
> >  XLOG_SWITCH
> >  XLOG_BACKUP_END
> >  XLOG_PARAMETER_CHANGE
> >  XLOG_RESTORE_POINT
> >  XLOG_FPW_CHANGE
> >  XLOG_END_OF_RECOVERY
> > 
> > 
> > RM_XACT_ID:
> >  XLOG_XACT_COMMIT
> >  XLOG_XACT_PREPARE
> >  XLOG_XACT_ABORT
> >  XLOG_XACT_COMMIT_PREPARED
> >  XLOG_XACT_ABORT_PREPARED
> >  XLOG_XACT_ASSIGNMENT
> >  XLOG_XACT_INVALIDATIONS
> > 
> > Do we need all of these?
> No. Strictly speaking, you are right.
> We still have types of WAL that are not necessarily needed.
> For example, XLOG_END_OF_RECOVERY is not useful
> because wal_level=none doesn't recover from any accidents.
> Or, XLOG_CHECKPOINT_ONLINE is used when we execute CHECKPOINT
> not for shutting down. Thus we could eliminate more.

Yeah, although it's enough only to restrict non-harmful records
practically, if we find that only a few kinds of records are needed,
maybe it's cleaner to allow only required record type(s).

> > And, currenly what decides whether to emit a wal record according to
> > wal_level is the caller of XLogInsert. 
> Yes.
> 
> > So doing this at XLogInsert-level means
> > that we bring the criteria of the necessity of wal-record into xlog layer only for
> > wal_level=none. I'm not sure it is the right direction.
> I'm sorry. I didn't understand what "doing this" and "xlog layer" meant.

"doing this" meant filtering record types.

> Did you mean that fixing the caller side of XLogInsert (e.g. CreateCheckPoint)
> is not the right direction ? Or, fixing the function of XLogInsert is not the right direction ?

Maybe it's right that if we can filter-out records looking only rmid,
since the xlog facility doesn't need to know about record types of a
resource manager.  But if we need to finer-grained control on the
record types, I'm afraid that that's wrong.  However, if we need only
the XLOG_CHECKPOINT_SHUTDOWN record, it might be better to let
XLogInsert filter records rather than inserting that filtering code to
all the caller sites.

> > At Fri, 27 Nov 2020 07:01:16 +0000, "tsunakawa.takay@fujitsu.com"
> > <tsunakawa.takay@fujitsu.com> wrote in
> > > I'm afraid "none" doesn't represent the behavior because RM_XLOG_ID and
> > RM_XACT_ID WAL records, except for XLOG_FPI_*, are emitted.  What's the
> > good name?  IIUC, "minimal" is named after the fact that the minimal
> > amount of WAL necessary for crash recovery is generated.  "norecovery" or
> > "unrecoverable"?
> Lastly, I found another name which expresses the essential characteristic of this wal_level.
> How about the name of wal_level="crash_unsafe" ?
> What did you think ?

I don't dislike "none" since it seems to me practically "none".  It
seems rather correct if we actually need only the shutdown checkpoint
record.

"unrecoverable" is apparently misleading. "crash_unsafe" is precise
but seems somewhat alien being among "logical", "replica" and
"minimal".

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



Re: Disable WAL logging to speed up data loading

From
Masahiko Sawada
Date:
()

On Mon, Nov 30, 2020 at 2:39 PM Kyotaro Horiguchi
<horikyota.ntt@gmail.com> wrote:
>
> At Fri, 27 Nov 2020 13:34:49 +0000, "osumi.takamichi@fujitsu.com" <osumi.takamichi@fujitsu.com> wrote in
> > Thank you, Horiguchi-San
> >
> > > I haven't seen a criteria of whether a record is emitted or not for
> > > wal_leve=none.
> > >
> > > We're emitting only redo logs.  So I think theoretically we don't need anything
> > > other than the shutdown checkpoint record because we don't perform
> > > recovery and checkpoint record is required at startup.
> > >
> > > RM_XLOG_ID:
> > >  XLOG_FPI_FOR_HINT  - not needed?
> > >  XLOG_FPI           - not needed?
> > >
> > >  XLOG_CHECKPOINT_SHUTDOWN - must have
> > >
> > > So how about the followings?
> > >  XLOG_CHECKPOINT_ONLINE
> > >  XLOG_NOOP
> > >  XLOG_NEXTOID
> > >  XLOG_SWITCH
> > >  XLOG_BACKUP_END
> > >  XLOG_PARAMETER_CHANGE
> > >  XLOG_RESTORE_POINT
> > >  XLOG_FPW_CHANGE
> > >  XLOG_END_OF_RECOVERY
> > >
> > >
> > > RM_XACT_ID:
> > >  XLOG_XACT_COMMIT
> > >  XLOG_XACT_PREPARE
> > >  XLOG_XACT_ABORT
> > >  XLOG_XACT_COMMIT_PREPARED
> > >  XLOG_XACT_ABORT_PREPARED
> > >  XLOG_XACT_ASSIGNMENT
> > >  XLOG_XACT_INVALIDATIONS
> > >
> > > Do we need all of these?
> > No. Strictly speaking, you are right.
> > We still have types of WAL that are not necessarily needed.
> > For example, XLOG_END_OF_RECOVERY is not useful
> > because wal_level=none doesn't recover from any accidents.
> > Or, XLOG_CHECKPOINT_ONLINE is used when we execute CHECKPOINT
> > not for shutting down. Thus we could eliminate more.
>
> Yeah, although it's enough only to restrict non-harmful records
> practically, if we find that only a few kinds of records are needed,
> maybe it's cleaner to allow only required record type(s).
>
> > > And, currenly what decides whether to emit a wal record according to
> > > wal_level is the caller of XLogInsert.
> > Yes.
> >
> > > So doing this at XLogInsert-level means
> > > that we bring the criteria of the necessity of wal-record into xlog layer only for
> > > wal_level=none. I'm not sure it is the right direction.
> > I'm sorry. I didn't understand what "doing this" and "xlog layer" meant.
>
> "doing this" meant filtering record types.
>
> > Did you mean that fixing the caller side of XLogInsert (e.g. CreateCheckPoint)
> > is not the right direction ? Or, fixing the function of XLogInsert is not the right direction ?
>
> Maybe it's right that if we can filter-out records looking only rmid,
> since the xlog facility doesn't need to know about record types of a
> resource manager.  But if we need to finer-grained control on the
> record types, I'm afraid that that's wrong.

I also think we should have the caller of XLogInsert() control whether
or not to write a WAL record according to the setting of wal_level
like XLogStandbyInfoActive() and XLogLogicalInfoActive(). Since most
callers seem to assume the wal record will be written when calling to
XLogInsert() I think it's better not to break that assumption.

> However, if we need only
> the XLOG_CHECKPOINT_SHUTDOWN record, it might be better to let
> XLogInsert filter records rather than inserting that filtering code to
> all the caller sites.
>
> > > At Fri, 27 Nov 2020 07:01:16 +0000, "tsunakawa.takay@fujitsu.com"
> > > <tsunakawa.takay@fujitsu.com> wrote in
> > > > I'm afraid "none" doesn't represent the behavior because RM_XLOG_ID and
> > > RM_XACT_ID WAL records, except for XLOG_FPI_*, are emitted.  What's the
> > > good name?  IIUC, "minimal" is named after the fact that the minimal
> > > amount of WAL necessary for crash recovery is generated.  "norecovery" or
> > > "unrecoverable"?
> > Lastly, I found another name which expresses the essential characteristic of this wal_level.
> > How about the name of wal_level="crash_unsafe" ?
> > What did you think ?
>
> I don't dislike "none" since it seems to me practically "none".

Me too.

> It seems rather correct if we actually need only the shutdown checkpoint
> record.
>
> "unrecoverable" is apparently misleading. "crash_unsafe" is precise
> but seems somewhat alien being among "logical", "replica" and
> "minimal".

Since the WAL protocol is for durability I'm concerned the name
'crash_unsafe' (and 'unrecoverable') leads to a contradiction.

Regards,

-- 
Masahiko Sawada
EnterpriseDB:  https://www.enterprisedb.com/



RE: Disable WAL logging to speed up data loading

From
"tsunakawa.takay@fujitsu.com"
Date:
From: Kyotaro Horiguchi <horikyota.ntt@gmail.com>
> We're emitting only redo logs.  So I think theoretically we don't need
> anything other than the shutdown checkpoint record because we don't
> perform recovery and checkpoint record is required at startup.
>
> RM_XLOG_ID:
>  XLOG_FPI_FOR_HINT  - not needed?
>  XLOG_FPI           - not needed?
>
>  XLOG_CHECKPOINT_SHUTDOWN - must have
>
> So how about the followings?
>  XLOG_CHECKPOINT_ONLINE
>  XLOG_NOOP
>  XLOG_NEXTOID
>  XLOG_SWITCH
>  XLOG_BACKUP_END
>  XLOG_PARAMETER_CHANGE
>  XLOG_RESTORE_POINT
>  XLOG_FPW_CHANGE
>  XLOG_END_OF_RECOVERY
>
>
> RM_XACT_ID:
>  XLOG_XACT_COMMIT
>  XLOG_XACT_PREPARE
>  XLOG_XACT_ABORT
>  XLOG_XACT_COMMIT_PREPARED
>  XLOG_XACT_ABORT_PREPARED
>  XLOG_XACT_ASSIGNMENT
>  XLOG_XACT_INVALIDATIONS
>
> Do we need all of these?

What need to be emitted even when wal_level = none are:

RM_XLOG_ID:
- XLOG_CHECKPOINT_SHUTDOWN
- XLOG_PARAMETER_CHANGE

RM_XACT_ID:
-  XLOG_XACT_PREPARE

XLOG_PARAMETER_CHANGE is necessary to detect during archive recovery that wal_level was changed from >= replica to
none,thus failing the archive recovery.  This is for the fix discussed in this thread to change the error level from
WARNINGto FATAL. 


> Yeah, although it's enough only to restrict non-harmful records
> practically, if we find that only a few kinds of records are needed,
> maybe it's cleaner to allow only required record type(s).
>
> Maybe it's right that if we can filter-out records looking only rmid,
> since the xlog facility doesn't need to know about record types of a
> resource manager.  But if we need to finer-grained control on the
> record types, I'm afraid that that's wrong.  However, if we need only
> the XLOG_CHECKPOINT_SHUTDOWN record, it might be better to let
> XLogInsert filter records rather than inserting that filtering code to
> all the caller sites.

Agreed.  As the kind of WAL records to be emitted is very limited, I think XLogInsert() can filter them where the
currentpatch does.  (OTOH, the boot strap mode filters WAL coarsely as follows.  I thought we may follow this coarse
RMID-basedfiltering as a pessimistic safeguard against new kind of WAL records that are necessary even in wal_level =
none.)

    /*
     * In bootstrap mode, we don't actually log anything but XLOG resources;
     * return a phony record pointer.
     */
    if (IsBootstrapProcessingMode() && rmid != RM_XLOG_ID)
    {
        XLogResetInsertion();
        EndPos = SizeOfXLogLongPHD; /* start of 1st chkpt record */
        return EndPos;
    }


> I don't dislike "none" since it seems to me practically "none".  It
> seems rather correct if we actually need only the shutdown checkpoint
> record.
>
> "unrecoverable" is apparently misleading. "crash_unsafe" is precise
> but seems somewhat alien being among "logical", "replica" and
> "minimal".

OK, I'm happy with "none" too.  We can describe in the manual that some limited amount of WAL is emitted.


Regards
Takayuki Tsunakawa




RE: Disable WAL logging to speed up data loading

From
"osumi.takamichi@fujitsu.com"
Date:
Hi


Sorry for being late.
On Tuesday, December 1, 2020 10:42 AM Tsunakawa, Takayuki <tsunakawa.takay@fujitsu.com> wrote:
> From: Kyotaro Horiguchi <horikyota.ntt@gmail.com>
> > Yeah, although it's enough only to restrict non-harmful records
> > practically, if we find that only a few kinds of records are needed,
> > maybe it's cleaner to allow only required record type(s).
> >
> > Maybe it's right that if we can filter-out records looking only rmid,
> > since the xlog facility doesn't need to know about record types of a
> > resource manager.  But if we need to finer-grained control on the
> > record types, I'm afraid that that's wrong.  However, if we need only
> > the XLOG_CHECKPOINT_SHUTDOWN record, it might be better to let
> > XLogInsert filter records rather than inserting that filtering code to
> > all the caller sites.
>
> Agreed.  As the kind of WAL records to be emitted is very limited, I think
> XLogInsert() can filter them where the current patch does.
Yeah, I can do that.
I'll restrict the types of WAL in a strict manner,
which means filtering out the types of WAL in XLogInsert().
I'll modify this point in the next patch.


> > I don't dislike "none" since it seems to me practically "none".  It
> > seems rather correct if we actually need only the shutdown checkpoint
> > record.
> >
> > "unrecoverable" is apparently misleading. "crash_unsafe" is precise
> > but seems somewhat alien being among "logical", "replica" and
> > "minimal".
>
> OK, I'm happy with "none" too.  We can describe in the manual that some
> limited amount of WAL is emitted.
Sure, no problem. Let's keep adopting "none", then.

By the way, I've conducted one additional performance evaluation
to see whether reducing WAL could contribute to boost the speed of data loading or not.

I prepared a test VM with 24 cores, HDD and 128GB memory
and used a 9.3GB input file which can be generated by pgbench's scale factor 1000.
The test table is a partitioned table whose number of child tables is 20.
The reason why I choose partitioned table is to remove the concern
that any delay of the loading time comes from lock contention of table.
Therefore, I divided the input file into 20 files and loaded each file
by different 20 sessions respectively toward each correspondent table.
I compared wal_level=minimal and wal_level=none this time
by using the server with the PG configurations below.

-----------------------------------------
wal_level = minimal or minimal
archive_mode = off
max_prepared_transactions = 128
max_worker_processes = 128
max_parallel_workers = 128
max_wal_senders = 0
max_wal_size = 100GB
wal_buffers = 1GB
checkpoint_timeout = 1d
shared_buffers = 32GB # 25% of RAM
maintenance_work_mem = 26GB # 20% of RAM
work_mem = 26GB # 20% of RAM
------------------------------------------

I executed each wal_level three times and calculated the average time
and found that disabling WAL logging reduced about 73 % of the minimal's loading speed
in this test. This speed-up came from the difference of generated WAL sizes.
In this test, to load the data generated more than 10GB of WALs with wal_level=minimal
while wal_level=none emits just 760 bytes of WALs.

I double-checked this fact from pg_wal_lsn_diff() for each wal_level
and had a look at the folder size of pg_wal.
I expect this size for none will become smaller when
I take the modification to filter out the types of WAL which is discussed above.
Also, I monitored numbers of iostat's 'util' and noticed that
util's spike to use I/O reduced from twice to once when I changed the level
from minimal to none, which should be the effect of the patch.

Any comments ?

Best,
    Takamichi Osumi




RE: Disable WAL logging to speed up data loading

From
"tsunakawa.takay@fujitsu.com"
Date:
From: Osumi, Takamichi/大墨 昂道 <osumi.takamichi@fujitsu.com>
> I executed each wal_level three times and calculated the average time
> and found that disabling WAL logging reduced about 73 % of the minimal's
> loading speed
> in this test. This speed-up came from the difference of generated WAL sizes.

So, it's 4x speedup when the WAL buffer and/or disk is likely to be saturated.  That's nice!


> In this test, to load the data generated more than 10GB of WALs with
> wal_level=minimal
> while wal_level=none emits just 760 bytes of WALs.
>
> I expect this size for none will become smaller when
> I take the modification to filter out the types of WAL which is discussed above.

I don't expect so, because the WAL volume is already very small in this workload (probably only the commit records.)


> Also, I monitored numbers of iostat's 'util' and noticed that
> util's spike to use I/O reduced from twice to once when I changed the level
> from minimal to none, which should be the effect of the patch.

It's a bit mysterious.  I thought %util would be constantly 100% when wal_level = minimal.  That may be because
wal_buffersis large. 


Regards
Takayuki Tsunakawa





RE: Disable WAL logging to speed up data loading

From
"osumi.takamichi@fujitsu.com"
Date:
Hello


I've made a new patch v05 that took in comments
to filter out WALs more strictly and addressed some minor fixes
that were discussed within past few days.
Also, I changed the documentations, considering those modifications.


Best,
    Takamichi Osumi


Attachment

RE: Disable WAL logging to speed up data loading

From
"tsunakawa.takay@fujitsu.com"
Date:
        From: Osumi, Takamichi/大墨 昂道 <osumi.takamichi@fujitsu.com>
> I've made a new patch v05 that took in comments to filter out WALs more strictly
> and addressed some minor fixes that were discussed within past few days.
> Also, I changed the documentations, considering those modifications.

The code looks good, and the performance seems to be nice, so I marked this ready for committer.


Regards
Takayuki Tsunakawa




Re: Disable WAL logging to speed up data loading

From
Michael Paquier
Date:
On Thu, Dec 03, 2020 at 03:52:47AM +0000, tsunakawa.takay@fujitsu.com wrote:
> The code looks good, and the performance seems to be nice, so I
> marked this ready for committer.

FWIW, I am extremely afraid of this proposal because this is basically
a footgun able to corrupt customer instances, and I am ready to bet
that people would switch wal_level to none because they see a lot of
benefits in doing so at first sight, until the host running the server
is plugged off and they need to use pg_resetwal in urgency to bring an
instance online.  Users have already the option to make things go bad,
just by disabling full page writes or fsync, but I really don't think
that we should put in their hands more options able to break
instances, nor should we try to spend more efforts in having more
"protections" that would trigger only once the instance is already
fried.

Perhaps this is something that Horiguchi-san pointed out upthread in
[1] (last sentence of first paragraph), but did you consider that it
is already possible to do bulk-loading with a minimal amount of WAL
generated as long as you do the COPY within the transaction that
created the table?  Quoting the docs in [2]:
"COPY is fastest when used within the same transaction as an earlier
CREATE TABLE or TRUNCATE command. In such cases no WAL needs to be
written, because in case of an error, the files containing the newly
loaded data will be removed anyway. However, this consideration only
applies when wal_level is minimal as all commands must write WAL
otherwise."

Upgrade scenarios have been mentioned in this case as being a pain
when it comes to take advantage of this optimization.  Wouldn't it be
safer if we took a client approach instead, where restores are able to
load the data with a cheap succession of commands by loading the data
in a transaction done after a TRUNCATE?

[1]: https://www.postgresql.org/message-id/20201001.115136.288898409051085426.horikyota.ntt@gmail.com
[2]: https://www.postgresql.org/docs/current/populate.html#POPULATE-COPY-FROM
--
Michael

Attachment

RE: Disable WAL logging to speed up data loading

From
"osumi.takamichi@fujitsu.com"
Date:
Hi, Michael


Thank you for your attention to this thread.
On Friday, December 25, 2020 4:09 PM  Michael Paquier <michael@paquier.xyz> wrote:
> On Thu, Dec 03, 2020 at 03:52:47AM +0000, tsunakawa.takay@fujitsu.com
> wrote:
> > The code looks good, and the performance seems to be nice, so I marked
> > this ready for committer.
>
> FWIW, I am extremely afraid of this proposal because this is basically a
> footgun able to corrupt customer instances, and I am ready to bet that people
> would switch wal_level to none because they see a lot of benefits in doing so
> at first sight, until the host running the server is plugged off and they need to
> use pg_resetwal in urgency to bring an instance online.
In the patch, I added plenty of descriptions (especially cautions) to the documents.
At least, when users notice the existence of "none" parameter for wal_level,
they cannot avoid having a look at such cautions.
Accordingly, it's really impossible that they see only the merits.

In terms of pg_resetwal,
it should not happen that they use it to utilize the server
corrupted by failure of any operation during wal_level=none.

I documented clearly
this wal_level is designed to make
the server never start up again when any unexpected crash is detected
and thus users need to recreate the whole cluster again.

> Users have already
> the option to make things go bad, just by disabling full page writes or fsync,
> but I really don't think that we should put in their hands more options able to
> break instances, nor should we try to spend more efforts in having more
> "protections" that would trigger only once the instance is already fried.
>
> Perhaps this is something that Horiguchi-san pointed out upthread in [1]
> (last sentence of first paragraph), but did you consider that it is already
> possible to do bulk-loading with a minimal amount of WAL generated as long
> as you do the COPY within the transaction that created the table?  Quoting
> the docs in [2]:
> "COPY is fastest when used within the same transaction as an earlier
> CREATE TABLE or TRUNCATE command. In such cases no WAL needs to be
> written, because in case of an error, the files containing the newly loaded data
> will be removed anyway. However, this consideration only applies when
> wal_level is minimal as all commands must write WAL otherwise."
>
> Upgrade scenarios have been mentioned in this case as being a pain when it
> comes to take advantage of this optimization.  Wouldn't it be safer if we took
> a client approach instead, where restores are able to load the data with a
> cheap succession of commands by loading the data in a transaction done
> after a TRUNCATE?
In a data warehouse environment,
the environment of our scenario in this thread in mind,
I think that to truncate target table
beforehand in the same transaction is not always realistic solution.
Imagine an appended data loading case to execute bulk data load
into one specific table with a bunch of records.

Best Regards,
    Takamichi Osumi



Re: Disable WAL logging to speed up data loading

From
Masahiko Sawada
Date:
On Thu, Dec 3, 2020 at 12:14 PM osumi.takamichi@fujitsu.com
<osumi.takamichi@fujitsu.com> wrote:
>
> Hello
>
>
> I've made a new patch v05 that took in comments
> to filter out WALs more strictly and addressed some minor fixes
> that were discussed within past few days.
> Also, I changed the documentations, considering those modifications.
>

From a backup management tool perspective, how can they detect that
wal_level has been changed to ‘none' (and back to the normal)? IIUC
once we changed wal_level to none, old backups that are taken before
setting to ‘none’ can be used only for restoring the database to the
point before the LSN where setting 'wal_level = none'. The users can
neither restore the database to any points in the term of 'wal_level =
none' nor use an old backup to restore the database to the point after
LSN where setting 'wal_level = none’. I think we might need to provide
a way to detect the changes other than reading XLOG_PARAMETER_CHANGE.

Regards,

--
Masahiko Sawada
EnterpriseDB:  https://www.enterprisedb.com/



RE: Disable WAL logging to speed up data loading

From
"osumi.takamichi@fujitsu.com"
Date:
Hello Sawada-San


On Monday, December 28, 2020 2:29 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> On Thu, Dec 3, 2020 at 12:14 PM osumi.takamichi@fujitsu.com
> <osumi.takamichi@fujitsu.com> wrote:
> >
> > I've made a new patch v05 that took in comments to filter out WALs
> > more strictly and addressed some minor fixes that were discussed
> > within past few days.
> > Also, I changed the documentations, considering those modifications.
> 
> From a backup management tool perspective, how can they detect that
> wal_level has been changed to ‘none' (and back to the normal)? IIUC once
> we changed wal_level to none, old backups that are taken before setting to
> ‘none’ can be used only for restoring the database to the point before the
> LSN where setting 'wal_level = none'. The users can neither restore the
> database to any points in the term of 'wal_level = none' nor use an old backup
> to restore the database to the point after LSN where setting 'wal_level =
> none’. I think we might need to provide a way to detect the changes other
> than reading XLOG_PARAMETER_CHANGE.
In the past, we discussed the aspect of backup management tool in [1]
and concluded that this should be another patch separated from this thread
because to compare the wal_level changes between snapshots
applies to wal_level = minimal, too. Please have a look at the "second idea"
in the e-mail in the [1] and responses to it.

 [1] -
https://www.postgresql.org/message-id/OSBPR01MB4888B34B81A6E0DD46B5D063EDE00%40OSBPR01MB4888.jpnprd01.prod.outlook.com

Best Regards,
    Takamichi Osumi

Re: Disable WAL logging to speed up data loading

From
Masahiko Sawada
Date:
On Mon, Dec 28, 2020 at 4:29 PM osumi.takamichi@fujitsu.com
<osumi.takamichi@fujitsu.com> wrote:
>
> Hello Sawada-San
>
>
> On Monday, December 28, 2020 2:29 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> > On Thu, Dec 3, 2020 at 12:14 PM osumi.takamichi@fujitsu.com
> > <osumi.takamichi@fujitsu.com> wrote:
> > >
> > > I've made a new patch v05 that took in comments to filter out WALs
> > > more strictly and addressed some minor fixes that were discussed
> > > within past few days.
> > > Also, I changed the documentations, considering those modifications.
> >
> > From a backup management tool perspective, how can they detect that
> > wal_level has been changed to ‘none' (and back to the normal)? IIUC once
> > we changed wal_level to none, old backups that are taken before setting to
> > ‘none’ can be used only for restoring the database to the point before the
> > LSN where setting 'wal_level = none'. The users can neither restore the
> > database to any points in the term of 'wal_level = none' nor use an old backup
> > to restore the database to the point after LSN where setting 'wal_level =
> > none’. I think we might need to provide a way to detect the changes other
> > than reading XLOG_PARAMETER_CHANGE.
> In the past, we discussed the aspect of backup management tool in [1]
> and concluded that this should be another patch separated from this thread
> because to compare the wal_level changes between snapshots
> applies to wal_level = minimal, too. Please have a look at the "second idea"
> in the e-mail in the [1] and responses to it.
>

Thank you for telling me about the discussion!

The discussion already started on another thread? I think it might be
better to consider it in parallel, if not started yet. We can verify
beforehand that the proposed solutions will really work well together
with backup management tools. And from the user perspective, I wonder
if they would like to see this feature in the same release where
wal_level = none is introduced. Otherwise, the fact that some backup
management tools won’t be able to work together with wal_level = none
will be a big restriction for users. That patch would probably not be
a blocker of this patch but will facilitate the discussion.

Regards,

--
Masahiko Sawada
EnterpriseDB:  https://www.enterprisedb.com/



Re: Disable WAL logging to speed up data loading

From
Simon Riggs
Date:
On Fri, 25 Dec 2020 at 07:09, Michael Paquier <michael@paquier.xyz> wrote:
>
> On Thu, Dec 03, 2020 at 03:52:47AM +0000, tsunakawa.takay@fujitsu.com wrote:
> > The code looks good, and the performance seems to be nice, so I
> > marked this ready for committer.
>
> FWIW, I am extremely afraid of this proposal because this is basically
> a footgun able to corrupt customer instances, and I am ready to bet
> that people would switch wal_level to none because they see a lot of
> benefits in doing so at first sight, until the host running the server
> is plugged off and they need to use pg_resetwal in urgency to bring an
> instance online.

Agreed, it is a footgun. -1 to commit the patch as-is.

The patch to avoid WAL is simple but it is dangerous for both the user
and the PostgreSQL project.

In my experience, people will use this option and when it crashes and
they lose their data, they will claim PostgreSQL broke and that they
were not stupid enough to use this option. Data loss has always been
the most serious error for PostgreSQL and our reputation for
protecting data has been hard won; it can easily be lost in a moment
of madness. Please consider how the headlines will read, bearing in
mind past incidents and negative press. Yes, we did think of this
feature already and rejected it.

If we ever did allow such an option, it must contain these things (IMHO):
* the option should be called "unsafe" or "allows_data_loss", not just
"none" (anything less than "minimal" must be insufficient or
unsafe...)
* the option must be set in the control file and be part of the same
patch, so users cannot easily edit things to hide their unsafe usage
* we must not support the option of going down to "unsafe" and then
back up again. It must be a one-way transition from "unsafe" to a
higher level, so if people want to use this for temp reporting servers
or initial loading, great, but they can't use it as a quick speed-up
for databases containing needs-to-be-safe data. Possibly the state
change might be "unsafe" -> "needs_backup" -> "minimal"... or some
other way to signal to backup.

-- 
Simon Riggs                http://www.EnterpriseDB.com/



Re: Disable WAL logging to speed up data loading

From
Michael Paquier
Date:
On Mon, Dec 28, 2020 at 11:06:30AM +0000, Simon Riggs wrote:
> Agreed, it is a footgun. -1 to commit the patch as-is.
>
> The patch to avoid WAL is simple but it is dangerous for both the user
> and the PostgreSQL project.

Something that has not been mentioned on this thread is that if you
could also put pg_wal/ on a RAM disk.  That's similarly unsafe, of
course, but it does not require any extra upstream patching, and that
should be really fast for the case of this thread.  If you care about
consistency, there are solutions like PMEM that we could look more
into.

> In my experience, people will use this option and when it crashes and
> they lose their data, they will claim PostgreSQL broke and that they
> were not stupid enough to use this option. Data loss has always been
> the most serious error for PostgreSQL and our reputation for
> protecting data has been hard won; it can easily be lost in a moment
> of madness. Please consider how the headlines will read, bearing in
> mind past incidents and negative press.

Yeah..

> Yes, we did think of this feature already and rejected it.

I don't recall seeing such a discussion.  Do you have some references?
Just a matter of curiosity :)
--
Michael

Attachment

Re: Disable WAL logging to speed up data loading

From
Simon Riggs
Date:
On Wed, 30 Dec 2020 at 13:04, Michael Paquier <michael@paquier.xyz> wrote:

> > Yes, we did think of this feature already and rejected it.
>
> I don't recall seeing such a discussion.  Do you have some references?
> Just a matter of curiosity :)

Off-list discussions.

-- 
Simon Riggs                http://www.EnterpriseDB.com/



RE: Disable WAL logging to speed up data loading

From
"tsunakawa.takay@fujitsu.com"
Date:
From: Michael Paquier <michael@paquier.xyz>
> Something that has not been mentioned on this thread is that if you could also
> put pg_wal/ on a RAM disk.  That's similarly unsafe, of course, but it does not
> require any extra upstream patching, and that should be really fast for the case
> of this thread.  If you care about consistency, there are solutions like PMEM
> that we could look more into.

As noted in this thread, the customer is worried about and wants to avoid handling the possible high volume storage of
WALduring the data loading.  That's understandable.  In that regard, DRAM and PMEM are worse because their capacity is
morelimited than SSD. 


Regards
Takayuki Tsunakawa





RE: Disable WAL logging to speed up data loading

From
"tsunakawa.takay@fujitsu.com"
Date:
From: Simon Riggs <simon@2ndquadrant.com>
> Agreed, it is a footgun. -1 to commit the patch as-is.
> 
> The patch to avoid WAL is simple but it is dangerous for both the user
> and the PostgreSQL project.
> 
> In my experience, people will use this option and when it crashes and
> they lose their data, they will claim PostgreSQL broke and that they
> were not stupid enough to use this option. Data loss has always been
> the most serious error for PostgreSQL and our reputation for
> protecting data has been hard won; it can easily be lost in a moment
> of madness. Please consider how the headlines will read, bearing in
> mind past incidents and negative press. Yes, we did think of this
> feature already and rejected it.

Could you share the negative press that blames Postgres due to the user's misuse of some feature like fsync?


> If we ever did allow such an option, it must contain these things (IMHO):
> * the option should be called "unsafe" or "allows_data_loss", not just
> "none" (anything less than "minimal" must be insufficient or
> unsafe...)

One idea I proposed is "wal_level = unrecoverable" because it clearly states the bad consequence and Oracle also uses
UNRECOVERABLEas a data loading option (and I found it intuitive.)  But others here commented that "none" would be OK.
Idon't have a strong opinion on naming, as I think what's important is warn the user in the documentation.
 


> * the option must be set in the control file and be part of the same
> patch, so users cannot easily edit things to hide their unsafe usage

wal_level setting is stored in the control file as before.  If the server crashes while wal_level is set to none, the
serverrefuses to start saying that's because wal_level is set to none, which is like MySQL.  So, the user cannot hide
theirmisuse.
 


> * we must not support the option of going down to "unsafe" and then
> back up again. It must be a one-way transition from "unsafe" to a
> higher level, so if people want to use this for temp reporting servers
> or initial loading, great, but they can't use it as a quick speed-up
> for databases containing needs-to-be-safe data. Possibly the state
> change might be "unsafe" -> "needs_backup" -> "minimal"... or some
> other way to signal to backup.

I'm afraid I don't get a clear image.  Could you elaborate on that?  But anyway, I think that could be an overreaction
anda prominent caution would suffice (like the one in this patch.)
 


Regards
Takayuki Tsunakawa



RE: Disable WAL logging to speed up data loading

From
"osumi.takamichi@fujitsu.com"
Date:
Hi, Sawada-San


On Monday, December 28, 2020 7:12 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> On Mon, Dec 28, 2020 at 4:29 PM osumi.takamichi@fujitsu.com
> <osumi.takamichi@fujitsu.com> wrote:
> > On Monday, December 28, 2020 2:29 PM Masahiko Sawada
> <sawada.mshk@gmail.com> wrote:
> > > On Thu, Dec 3, 2020 at 12:14 PM osumi.takamichi@fujitsu.com
> > > <osumi.takamichi@fujitsu.com> wrote:
> > > >
> > > > I've made a new patch v05 that took in comments to filter out WALs
> > > > more strictly and addressed some minor fixes that were discussed
> > > > within past few days.
> > > > Also, I changed the documentations, considering those modifications.
> > >
> > > From a backup management tool perspective, how can they detect that
> > > wal_level has been changed to ‘none' (and back to the normal)? IIUC
> > > once we changed wal_level to none, old backups that are taken before
> > > setting to ‘none’ can be used only for restoring the database to the
> > > point before the LSN where setting 'wal_level = none'. The users can
> > > neither restore the database to any points in the term of 'wal_level
> > > = none' nor use an old backup to restore the database to the point
> > > after LSN where setting 'wal_level = none’. I think we might need to
> > > provide a way to detect the changes other than reading
> XLOG_PARAMETER_CHANGE.
> > In the past, we discussed the aspect of backup management tool in [1]
> > and concluded that this should be another patch separated from this
> > thread because to compare the wal_level changes between snapshots
> > applies to wal_level = minimal, too. Please have a look at the "second idea"
> > in the e-mail in the [1] and responses to it.
> >
> 
> Thank you for telling me about the discussion!
> 
> The discussion already started on another thread? I think it might be better to
> consider it in parallel, if not started yet. We can verify beforehand that the
> proposed solutions will really work well together with backup management
> tools. And from the user perspective, I wonder if they would like to see this
> feature in the same release where wal_level = none is introduced. Otherwise,
> the fact that some backup management tools won’t be able to work together
> with wal_level = none will be a big restriction for users. That patch would
> probably not be a blocker of this patch but will facilitate the discussion.
I don't think the new thread is created already.

By the way, I checked documents and user manuals of backup management tools like
pgBackRest, EDB's BART and pg_probackup respectively.
These tools work when wal_level is higher than minimal
because these use physical online backup or wal archiving and thus
they don't need to work together with wal_level=minimal or none.

> the fact that some backup management tools won’t be able to work together
> with wal_level = none will be a big restriction for users.
Accordingly, it turned out that current situation or introducing wal_level=none
doesn't become restriction for users from the backup management tools perspective.
Any comments ?

Best Regards,
    Takamichi Osumi

Re: Disable WAL logging to speed up data loading

From
Masahiko Sawada
Date:
On Tue, Jan 5, 2021 at 10:54 AM osumi.takamichi@fujitsu.com
<osumi.takamichi@fujitsu.com> wrote:
>
> Hi, Sawada-San
>
>
> On Monday, December 28, 2020 7:12 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> > On Mon, Dec 28, 2020 at 4:29 PM osumi.takamichi@fujitsu.com
> > <osumi.takamichi@fujitsu.com> wrote:
> > > On Monday, December 28, 2020 2:29 PM Masahiko Sawada
> > <sawada.mshk@gmail.com> wrote:
> > > > On Thu, Dec 3, 2020 at 12:14 PM osumi.takamichi@fujitsu.com
> > > > <osumi.takamichi@fujitsu.com> wrote:
> > > > >
> > > > > I've made a new patch v05 that took in comments to filter out WALs
> > > > > more strictly and addressed some minor fixes that were discussed
> > > > > within past few days.
> > > > > Also, I changed the documentations, considering those modifications.
> > > >
> > > > From a backup management tool perspective, how can they detect that
> > > > wal_level has been changed to ‘none' (and back to the normal)? IIUC
> > > > once we changed wal_level to none, old backups that are taken before
> > > > setting to ‘none’ can be used only for restoring the database to the
> > > > point before the LSN where setting 'wal_level = none'. The users can
> > > > neither restore the database to any points in the term of 'wal_level
> > > > = none' nor use an old backup to restore the database to the point
> > > > after LSN where setting 'wal_level = none’. I think we might need to
> > > > provide a way to detect the changes other than reading
> > XLOG_PARAMETER_CHANGE.
> > > In the past, we discussed the aspect of backup management tool in [1]
> > > and concluded that this should be another patch separated from this
> > > thread because to compare the wal_level changes between snapshots
> > > applies to wal_level = minimal, too. Please have a look at the "second idea"
> > > in the e-mail in the [1] and responses to it.
> > >
> >
> > Thank you for telling me about the discussion!
> >
> > The discussion already started on another thread? I think it might be better to
> > consider it in parallel, if not started yet. We can verify beforehand that the
> > proposed solutions will really work well together with backup management
> > tools. And from the user perspective, I wonder if they would like to see this
> > feature in the same release where wal_level = none is introduced. Otherwise,
> > the fact that some backup management tools won’t be able to work together
> > with wal_level = none will be a big restriction for users. That patch would
> > probably not be a blocker of this patch but will facilitate the discussion.
> I don't think the new thread is created already.
>
> By the way, I checked documents and user manuals of backup management tools like
> pgBackRest, EDB's BART and pg_probackup respectively.
> These tools work when wal_level is higher than minimal
> because these use physical online backup or wal archiving and thus
> they don't need to work together with wal_level=minimal or none.

Thank you for checking! The use case I imagined is that the user
temporarily changes wal_level to 'none' from 'replica' or 'logical' to
speed up loading and changes back to the normal. In this case, the
backups taken before the wal_level change cannot be used to restore
the database to the point after the wal_level change. So I think
backup management tools would want to recognize the time or LSN
when/where wal_level is changed to ‘none’ in order to do some actions
such as invalidating older backups, re-calculating backup redundancy
etc.

Actually the same is true when the user changes to ‘minimal’. The
tools would need to recognize the time or LSN in this case too. Since
temporarily changing wal_level has been an uncommon use case some
tools perhaps are not aware of that yet. But since introducing
wal_level = ’none’ could make the change common, I think we need to
provide a way for the tools.

Regards,

--
Masahiko Sawada
EnterpriseDB:  https://www.enterprisedb.com/



RE: Disable WAL logging to speed up data loading

From
"osumi.takamichi@fujitsu.com"
Date:
Hi,


On Tuesday, January 5, 2021 5:45 PM
Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> On Tue, Jan 5, 2021 at 10:54 AM osumi.takamichi@fujitsu.com
> <osumi.takamichi@fujitsu.com> wrote:
> > On Monday, December 28, 2020 7:12 PM Masahiko Sawada
> <sawada.mshk@gmail.com> wrote:
> > > On Mon, Dec 28, 2020 at 4:29 PM osumi.takamichi@fujitsu.com
> > > <osumi.takamichi@fujitsu.com> wrote:
> > > > On Monday, December 28, 2020 2:29 PM Masahiko Sawada
> > > <sawada.mshk@gmail.com> wrote:
> > > > > On Thu, Dec 3, 2020 at 12:14 PM osumi.takamichi@fujitsu.com
> > > > > <osumi.takamichi@fujitsu.com> wrote:
> > > > > >
> > > > > > I've made a new patch v05 that took in comments to filter out
> > > > > > WALs more strictly and addressed some minor fixes that were
> > > > > > discussed within past few days.
> > > > > > Also, I changed the documentations, considering those
> modifications.
> > > > >
> > > > > From a backup management tool perspective, how can they detect
> > > > > that wal_level has been changed to ‘none' (and back to the
> > > > > normal)? IIUC once we changed wal_level to none, old backups
> > > > > that are taken before setting to ‘none’ can be used only for
> > > > > restoring the database to the point before the LSN where setting
> > > > > 'wal_level = none'. The users can neither restore the database
> > > > > to any points in the term of 'wal_level = none' nor use an old
> > > > > backup to restore the database to the point after LSN where
> > > > > setting 'wal_level = none’. I think we might need to provide a
> > > > > way to detect the changes other than reading
> > > XLOG_PARAMETER_CHANGE.
> > > > In the past, we discussed the aspect of backup management tool in
> > > > [1] and concluded that this should be another patch separated from
> > > > this thread because to compare the wal_level changes between
> > > > snapshots applies to wal_level = minimal, too. Please have a look at the
> "second idea"
> > > > in the e-mail in the [1] and responses to it.
> > > >
> > >
> > > Thank you for telling me about the discussion!
> > >
> > > The discussion already started on another thread? I think it might
> > > be better to consider it in parallel, if not started yet. We can
> > > verify beforehand that the proposed solutions will really work well
> > > together with backup management tools. And from the user
> > > perspective, I wonder if they would like to see this feature in the
> > > same release where wal_level = none is introduced. Otherwise, the
> > > fact that some backup management tools won’t be able to work
> > > together with wal_level = none will be a big restriction for users. That patch
> would probably not be a blocker of this patch but will facilitate the discussion.
> > I don't think the new thread is created already.
> >
> > By the way, I checked documents and user manuals of backup management
> > tools like pgBackRest, EDB's BART and pg_probackup respectively.
> > These tools work when wal_level is higher than minimal because these
> > use physical online backup or wal archiving and thus they don't need
> > to work together with wal_level=minimal or none.
> 
> Thank you for checking! The use case I imagined is that the user temporarily
> changes wal_level to 'none' from 'replica' or 'logical' to speed up loading and
> changes back to the normal. In this case, the backups taken before the
> wal_level change cannot be used to restore the database to the point after the
> wal_level change. So I think backup management tools would want to
> recognize the time or LSN when/where wal_level is changed to ‘none’ in order
> to do some actions such as invalidating older backups, re-calculating backup
> redundancy etc.
> 
> Actually the same is true when the user changes to ‘minimal’. The tools would
> need to recognize the time or LSN in this case too. Since temporarily changing
> wal_level has been an uncommon use case some tools perhaps are not aware
> of that yet. But since introducing wal_level = ’none’ could make the change
> common, I think we need to provide a way for the tools.
Sawada-san, thanks a lot for your explanation and
for sure it's about time to launch a new thread for us.

This discussion can be separated from this wal_level thread and I agree with
your idea that what we are talking now would not be a blocker of the new wal_level patch.
So, kindly don't think that the new thread damages the content of this wal_level=none discussion first of all.
I'll create a new one soon.

Best Regards,
    Takamichi Osumi


Re: Disable WAL logging to speed up data loading

From
Robert Haas
Date:
On Wed, Dec 2, 2020 at 10:53 PM tsunakawa.takay@fujitsu.com
<tsunakawa.takay@fujitsu.com> wrote:
> The code looks good, and the performance seems to be nice, so I marked this ready for committer.

Were the issues that I mentioned regarding GIST (and maybe other AMs)
in the last paragraph of
http://postgr.es/m/CA+TgmoZEZ5RONS49C7mEpjhjndqMQtVrz_LCQUkpRWdmRevDnQ@mail.gmail.com
addressed in some way? That seems like a pretty hard engineering
problem to me, and I don't see that there's been any discussion of it.
Those are correctness concerns separate from any wal_level tracking we
might want to do to avoid accidental mistakes.

-- 
Robert Haas
EDB: http://www.enterprisedb.com



RE: Disable WAL logging to speed up data loading

From
"tsunakawa.takay@fujitsu.com"
Date:
From: Robert Haas <robertmhaas@gmail.com>
> Were the issues that I mentioned regarding GIST (and maybe other AMs)
> in the last paragraph of
> http://postgr.es/m/CA+TgmoZEZ5RONS49C7mEpjhjndqMQtVrz_LCQUkpRW
> dmRevDnQ@mail.gmail.com
> addressed in some way? That seems like a pretty hard engineering
> problem to me, and I don't see that there's been any discussion of it.
> Those are correctness concerns separate from any wal_level tracking we
> might want to do to avoid accidental mistakes.

Thank you very much for reminding me of this.  I forgot I replied as follows:


--------------------------------------------------
Unlogged GiST indexes use fake LSNs that are instance-wide.  Unlogged temporary GiST indexes use backend-local sequence
values. Other unlogged and temporary relations don't set LSNs on pages.  So, I think it's enough to call
GetFakeLSNForUnloggedRel()when wal_level = none as well.
 
--------------------------------------------------


But this is not correct.  We have to allow (RM_GIST_ID, XLOG_GIST_ASSIGN_LSN) WAL records to be emitted (by tweaking
thefilter in XLogInsert()), just like those WAL records are emitted when wal_level = minimal and and other WAL records
arenot emitted.
 


Regards
Takayuki Tsunakawa


Re: Disable WAL logging to speed up data loading

From
Kyotaro Horiguchi
Date:
At Fri, 8 Jan 2021 05:12:02 +0000, "tsunakawa.takay@fujitsu.com" <tsunakawa.takay@fujitsu.com> wrote in 
> From: Robert Haas <robertmhaas@gmail.com>
> > Were the issues that I mentioned regarding GIST (and maybe other AMs)
> > in the last paragraph of
> > http://postgr.es/m/CA+TgmoZEZ5RONS49C7mEpjhjndqMQtVrz_LCQUkpRW
> > dmRevDnQ@mail.gmail.com
> > addressed in some way? That seems like a pretty hard engineering
> > problem to me, and I don't see that there's been any discussion of it.
> > Those are correctness concerns separate from any wal_level tracking we
> > might want to do to avoid accidental mistakes.
> 
> Thank you very much for reminding me of this.  I forgot I replied as follows:
> 
> 
> --------------------------------------------------
> Unlogged GiST indexes use fake LSNs that are instance-wide.  Unlogged temporary GiST indexes use backend-local
sequencevalues.  Other unlogged and temporary relations don't set LSNs on pages.  So, I think it's enough to call
GetFakeLSNForUnloggedRel()when wal_level = none as well.
 
> --------------------------------------------------
> 
> 
> But this is not correct.  We have to allow (RM_GIST_ID,
> XLOG_GIST_ASSIGN_LSN) WAL records to be emitted (by tweaking the
> filter in XLogInsert()), just like those WAL records are emitted
> when wal_level = minimal and and other WAL records are not emitted.

Yeah, although LOGGED and UNLOGGED use incompatible LSNs, LOGGED
tables always uses real LSNs, maintained by that type of WAL record
while wal_level = minimal.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



Re: Disable WAL logging to speed up data loading

From
Masahiko Sawada
Date:
On Fri, Jan 8, 2021 at 2:12 PM tsunakawa.takay@fujitsu.com
<tsunakawa.takay@fujitsu.com> wrote:
>
> From: Robert Haas <robertmhaas@gmail.com>
> > Were the issues that I mentioned regarding GIST (and maybe other AMs)
> > in the last paragraph of
> > http://postgr.es/m/CA+TgmoZEZ5RONS49C7mEpjhjndqMQtVrz_LCQUkpRW
> > dmRevDnQ@mail.gmail.com
> > addressed in some way? That seems like a pretty hard engineering
> > problem to me, and I don't see that there's been any discussion of it.
> > Those are correctness concerns separate from any wal_level tracking we
> > might want to do to avoid accidental mistakes.
>
> Thank you very much for reminding me of this.  I forgot I replied as follows:
>
>
> --------------------------------------------------
> Unlogged GiST indexes use fake LSNs that are instance-wide.  Unlogged temporary GiST indexes use backend-local
sequencevalues.  Other unlogged and temporary relations don't set LSNs on pages.  So, I think it's enough to call
GetFakeLSNForUnloggedRel()when wal_level = none as well. 
> --------------------------------------------------
>
>
> But this is not correct.  We have to allow (RM_GIST_ID, XLOG_GIST_ASSIGN_LSN) WAL records to be emitted (by tweaking
thefilter in XLogInsert()), just like those WAL records are emitted when wal_level = minimal and and other WAL records
arenot emitted. 

I think it's better to have index AM (and perhaps table AM) control it
instead of filtering in XLogInsert(). Because otherwise third-party
access methods that use LSN like gist indexes cannot write WAL records
at all when wal_level = none even if they want.

Regards,

--
Masahiko Sawada
EnterpriseDB:  https://www.enterprisedb.com/



RE: Disable WAL logging to speed up data loading

From
"tsunakawa.takay@fujitsu.com"
Date:
From: Masahiko Sawada <sawada.mshk@gmail.com>
> I think it's better to have index AM (and perhaps table AM) control it
> instead of filtering in XLogInsert(). Because otherwise third-party
> access methods that use LSN like gist indexes cannot write WAL records
> at all when wal_level = none even if they want.

Hm, third-party extensions use RM_GENERIC_ID, so it should probably be allowed in XLogInsert() as well instead of
allowingcontrol in some other way.  It's not unnatural that WAL records for in-core AMs are filtered in XLogInsert().
 


Regards
Takayuki Tsunakawa




RE: Disable WAL logging to speed up data loading

From
"osumi.takamichi@fujitsu.com"
Date:
Hi

On Mon, Jan 11, 2021 9:14 AM Tsunakawa, Takayuki <tsunakawa.takay@fujitsu.com> wrote:
> From: Masahiko Sawada <sawada.mshk@gmail.com>
> > I think it's better to have index AM (and perhaps table AM) control it
> > instead of filtering in XLogInsert(). Because otherwise third-party
> > access methods that use LSN like gist indexes cannot write WAL records
> > at all when wal_level = none even if they want.
> 
> Hm, third-party extensions use RM_GENERIC_ID, so it should probably be
> allowed in XLogInsert() as well instead of allowing control in some other way.
> It's not unnatural that WAL records for in-core AMs are filtered in XLogInsert().
I updated the patch following this discussion,
and fixed the documentation as well.
No failure is found during make check-world.


Best Regards,
    Takamichi Osumi

Attachment

RE: Disable WAL logging to speed up data loading

From
"tsunakawa.takay@fujitsu.com"
Date:
From: Osumi, Takamichi/大墨 昂道 <osumi.takamichi@fujitsu.com>
> I updated the patch following this discussion, and fixed the documentation as
> well.


+          (rmid == RM_GIST_ID && info == RM_GIST_ID) ||
+          (rmid == RM_GENERIC_ID)))

info is wrong for GiST, and one pair of parenthesis is unnecessary.  The above would be:

+          (rmid == RM_GIST_ID && info == XLOG_GIST_ASSIGN_LSN) ||
+          rmid == RM_GENERIC_ID))


Regards
Takayuki Tsunakawa


RE: Disable WAL logging to speed up data loading

From
"osumi.takamichi@fujitsu.com"
Date:
On Tuesday, January 12, 2021 12:52 PM Takayuki/綱川 貴之 <tsunakawa.takay@fujitsu.com> wrote:
> From: Osumi, Takamichi/大墨 昂道 <osumi.takamichi@fujitsu.com>
> > I updated the patch following this discussion, and fixed the
> > documentation as well.
> 
> 
> +          (rmid == RM_GIST_ID && info == RM_GIST_ID) ||
> +          (rmid == RM_GENERIC_ID)))
> 
> info is wrong for GiST, and one pair of parenthesis is unnecessary.  The above
> would be:
> 
> +          (rmid == RM_GIST_ID && info ==
> XLOG_GIST_ASSIGN_LSN) ||
> +          rmid == RM_GENERIC_ID))
Oops, sorry for this careless mistake.
Fixed. And again, regression test produces no failure.

Best Regards,
    Takamichi Osumi

Attachment

RE: Disable WAL logging to speed up data loading

From
"tsunakawa.takay@fujitsu.com"
Date:
From: Osumi, Takamichi/大墨 昂道 <osumi.takamichi@fujitsu.com>
> Oops, sorry for this careless mistake.
> Fixed. And again, regression test produces no failure.

Now correct.  (Remains ready for committer.)


Regards
Takayuki Tsunakawa


Re: Disable WAL logging to speed up data loading

From
Kyotaro Horiguchi
Date:
At Tue, 12 Jan 2021 07:09:28 +0000, "osumi.takamichi@fujitsu.com" <osumi.takamichi@fujitsu.com> wrote in 
> On Tuesday, January 12, 2021 12:52 PM Takayuki/綱川 貴之 <tsunakawa.takay@fujitsu.com> wrote:
> > From: Osumi, Takamichi/大墨 昂道 <osumi.takamichi@fujitsu.com>
> > > I updated the patch following this discussion, and fixed the
> > > documentation as well.
> > 
> > 
> > +          (rmid == RM_GIST_ID && info == RM_GIST_ID) ||
> > +          (rmid == RM_GENERIC_ID)))
> > 
> > info is wrong for GiST, and one pair of parenthesis is unnecessary.  The above
> > would be:
> > 
> > +          (rmid == RM_GIST_ID && info ==
> > XLOG_GIST_ASSIGN_LSN) ||
> > +          rmid == RM_GENERIC_ID))
> Oops, sorry for this careless mistake.
> Fixed. And again, regression test produces no failure.

@@ -449,6 +450,18 @@ XLogInsert(RmgrId rmid, uint8 info)
         return EndPos;
     }
 
+    /* Issues only limited types of WAL when wal logging is disabled */
+    if (wal_level == WAL_LEVEL_NONE &&
+        !((rmid == RM_XLOG_ID && info == XLOG_CHECKPOINT_SHUTDOWN) ||
+          (rmid == RM_XLOG_ID && info == XLOG_PARAMETER_CHANGE) ||
+          (rmid == RM_XACT_ID && info == XLOG_XACT_PREPARE) ||
+          (rmid == RM_GIST_ID && info == XLOG_GIST_ASSIGN_LSN) ||
+          rmid == RM_GENERIC_ID))

As Sawada-san mentioned, this prevents future index AMs from being
pluggable.  Providing an interface would work but seems a bit too
invasive.  The record insertion flags is there for this very purpose.

XLogBeginInsert();
XLogSetRecrodFlags(XLOG_MARK_ESSENTIAL);  # new flag value
XLOGInsert(....);

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



Re: Disable WAL logging to speed up data loading

From
movead li
Date:
I read the patch and have two points:

1. I do basebackup for database then switch wal level from logical to none to logical and
of cause I archive the wal segments. Next I do PITR base on the basebackup, as a result
it success startup with a waring said maybe data missed.

Because the 'none' level is to bulkload data, do you think it's good that we still support
recover from a 'none' wal level.

2. I just mark wal_level as 'none' but fail to startup, it success after I drop the publication and
it's subscription,mark max_wal_senders as 0, drop replicate slot. I think it worth to write how 
we can startup a 'none' wal level database in document .

RE: Disable WAL logging to speed up data loading

From
"tsunakawa.takay@fujitsu.com"
Date:
From: Kyotaro Horiguchi <horikyota.ntt@gmail.com>
> XLogBeginInsert();
> XLogSetRecrodFlags(XLOG_MARK_ESSENTIAL);  # new flag value
> XLOGInsert(....);

Oh, sounds like a nice idea.  That's more flexible by allowing WAL-emitting modules to specify which WAL records are
mandatoryeven when wal_level is none. 

For example, gistXLogAssignLSN() adds the above flag like this:

    XLogBeginInsert();
    XLogSetRecordFlags(XLOG_MARK_UNIMPORTANT | XLOG_MARK_ESSENTIAL);
    XLogRegisterData((char *) &dummy, sizeof(dummy));

(Here's a word play - unimportant but essential, what's that?)

And the filter in XLogInsert() becomes:

+    if (wal_level == WAL_LEVEL_NONE &&
+        !((rmid == RM_XLOG_ID && info == XLOG_CHECKPOINT_SHUTDOWN) ||
+          (rmid == RM_XLOG_ID && info == XLOG_PARAMETER_CHANGE) ||
+          (rmid == RM_XACT_ID && info == XLOG_XACT_PREPARE) ||
+          (curinsert_flags & XLOG_MARK_ESSENTIAL)))

Or,

+    if (wal_level == WAL_LEVEL_NONE &&
+         !(curinsert_flags & XLOG_MARK_ESSENTIAL))

and add the new flag when emitting XLOG_CHECKPOINT_ONLINE, XLOG_PARAMETER_CHANGE and XLOG_PREPARE records.  I think
bothhave good reasons: the former centralizes the handling of XACT and XLOG RM WAL records (as the current XLOG module
doesalready), and the latter delegates the decision to each module.  Which would you prefer?  (I kind of like the
former,but this is a weak opinion.) 


Regards
Takayuki Tsunakawa






Re: Disable WAL logging to speed up data loading

From
Kyotaro Horiguchi
Date:
At Wed, 13 Jan 2021 06:01:34 +0000, "tsunakawa.takay@fujitsu.com" <tsunakawa.takay@fujitsu.com> wrote in 
> From: Kyotaro Horiguchi <horikyota.ntt@gmail.com>
> > XLogBeginInsert();
> > XLogSetRecrodFlags(XLOG_MARK_ESSENTIAL);  # new flag value
> > XLOGInsert(....);
> 
> Oh, sounds like a nice idea.  That's more flexible by allowing WAL-emitting modules to specify which WAL records are
mandatoryeven when wal_level is none.
 
> 
> For example, gistXLogAssignLSN() adds the above flag like this:
> 
>     XLogBeginInsert();
>     XLogSetRecordFlags(XLOG_MARK_UNIMPORTANT | XLOG_MARK_ESSENTIAL);
>     XLogRegisterData((char *) &dummy, sizeof(dummy));
> 
> (Here's a word play - unimportant but essential, what's that?)

Hmm. Food may not be important to someone but it is essential for
survival.  I think this is somethig like that :p

> And the filter in XLogInsert() becomes:
> 
> +    if (wal_level == WAL_LEVEL_NONE &&
> +        !((rmid == RM_XLOG_ID && info == XLOG_CHECKPOINT_SHUTDOWN) ||
> +          (rmid == RM_XLOG_ID && info == XLOG_PARAMETER_CHANGE) ||
> +          (rmid == RM_XACT_ID && info == XLOG_XACT_PREPARE) ||
> +          (curinsert_flags & XLOG_MARK_ESSENTIAL)))
> 
> Or,
> 
> +    if (wal_level == WAL_LEVEL_NONE &&
> +         !(curinsert_flags & XLOG_MARK_ESSENTIAL))
>
> and add the new flag when emitting XLOG_CHECKPOINT_ONLINE,

Yeah, I was going to comment about that.  Skipping the record makes
controlfile filled by a bogus checkpoint location.

> XLOG_PARAMETER_CHANGE and XLOG_PREPARE records.  I think both have
> good reasons: the former centralizes the handling of XACT and XLOG
> RM WAL records (as the current XLOG module does already), and the
> latter delegates the decision to each module.  Which would you
> prefer?  (I kind of like the former, but this is a weak opinion.)

Unfortunately, I prefer the latter as it is simple because it is in a
hot path.  As I think I mentioned upthread, I think the xlog stuff
should refrain from being conscious of things deeper than RMger-ID
level.  One of other reasons is that generally the issuer site is the
authoritative about the importance and essentiality of a record being
issued.  And I don't think it's very good to do the same thing in
different ways at the same time.  Fortunately each type of the recrods
has only few issuer places.

By the way, I noticed that pg_switch_wal() silently skips its
task. Desn't it need to give any sort of ERROR?

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



RE: Disable WAL logging to speed up data loading

From
"tsunakawa.takay@fujitsu.com"
Date:
From: Kyotaro Horiguchi <horikyota.ntt@gmail.com>
> >     XLogSetRecordFlags(XLOG_MARK_UNIMPORTANT |
> XLOG_MARK_ESSENTIAL);
> >     XLogRegisterData((char *) &dummy, sizeof(dummy));
> >
> > (Here's a word play - unimportant but essential, what's that?)
>
> Hmm. Food may not be important to someone but it is essential for
> survival.  I think this is somethig like that :p

Ah, that's a good answer.  I know a person around me who enjoys drinking alcohol but doesn't enjoy eating - he says he
doesn'tcare about taste of food.  So food is unimportant but nutrition is essential for him. 


> Unfortunately, I prefer the latter as it is simple because it is in a
> hot path.  As I think I mentioned upthread, I think the xlog stuff
> should refrain from being conscious of things deeper than RMger-ID
> level.  One of other reasons is that generally the issuer site is the
> authoritative about the importance and essentiality of a record being
> issued.  And I don't think it's very good to do the same thing in
> different ways at the same time.  Fortunately each type of the recrods
> has only few issuer places.

Agreed.


> By the way, I noticed that pg_switch_wal() silently skips its
> task. Desn't it need to give any sort of ERROR?

Osumi-san, can you check this?  (I thought we were aware of this according to Fujii-san's comment.)


Regards
Takayuki Tsunakawa




RE: Disable WAL logging to speed up data loading

From
"osumi.takamichi@fujitsu.com"
Date:
Hi


Thank you everyone
On Thursday, January 14, 2021 9:27 AM Tsunakawa, Takayuki/綱川 貴之 <tsunakawa.takay@fujitsu.com> wrote:
> From: Kyotaro Horiguchi <horikyota.ntt@gmail.com>
> > >     XLogSetRecordFlags(XLOG_MARK_UNIMPORTANT |
> > XLOG_MARK_ESSENTIAL);
> > >     XLogRegisterData((char *) &dummy, sizeof(dummy));
> > >
> > > (Here's a word play - unimportant but essential, what's that?)
> >
> > Hmm. Food may not be important to someone but it is essential for
> > survival.  I think this is somethig like that :p
>
> Ah, that's a good answer.  I know a person around me who enjoys drinking
> alcohol but doesn't enjoy eating - he says he doesn't care about taste of food.
> So food is unimportant but nutrition is essential for him.
>
>
> > Unfortunately, I prefer the latter as it is simple because it is in a
> > hot path.  As I think I mentioned upthread, I think the xlog stuff
> > should refrain from being conscious of things deeper than RMger-ID
> > level.  One of other reasons is that generally the issuer site is the
> > authoritative about the importance and essentiality of a record being
> > issued.  And I don't think it's very good to do the same thing in
> > different ways at the same time.  Fortunately each type of the recrods
> > has only few issuer places.
>
> Agreed.
>
>
> > By the way, I noticed that pg_switch_wal() silently skips its task.
> > Desn't it need to give any sort of ERROR?
>
> Osumi-san, can you check this?  (I thought we were aware of this according to
> Fujii-san's comment.)
I updated my patch to take in those feedbacks !
Have a look at the latest patch.


Best Regards,
    Takamichi Osumi


Attachment

RE: Disable WAL logging to speed up data loading

From
"osumi.takamichi@fujitsu.com"
Date:
Hi, Movead


Thanks for your comments.
> I read the patch and have two points:
> 
> 1. I do basebackup for database then switch wal level from logical to none to
> logical and of cause I archive the wal segments. Next I do PITR base on the
> basebackup, as a result it success startup with a waring said maybe data
> missed.
This applies to wal_level=minimal and is going to be addressed by
another thread [1].

> 
> Because the 'none' level is to bulkload data, do you think it's good that we still
> support recover from a 'none' wal level.
> 
> 2. I just mark wal_level as 'none' but fail to startup, it success after I drop the
> publication and it's subscription,mark max_wal_senders as 0, drop replicate slot.
> I think it worth to write how we can startup a 'none' wal level database in
> document .
The documentation [2] touches this aspect.
It says "Also, wal_level must be set to replica or
higher to allow replication slots to be used." already.
It is documented and applies to wal_level=minimal also
because 'src/backend/replication/slot.c' has code that throws the error you got as below.

    else if (wal_level < WAL_LEVEL_REPLICA)
        ereport(FATAL,
                (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
                 errmsg("physical replication slot \"%s\" exists, but wal_level < replica",
                        NameStr(cp.slotdata.name)),
                 errhint("Change wal_level to be replica or higher.")));

[1] -
https://www.postgresql.org/message-id/OSBPR01MB4888CBE1DA08818FD2D90ED8EDF90%40OSBPR01MB4888.jpnprd01.prod.outlook.com
[2] - https://www.postgresql.org/docs/13/runtime-config-replication.html

Best Regards,
    Takamichi Osumi

RE: Disable WAL logging to speed up data loading

From
"tsunakawa.takay@fujitsu.com"
Date:
From: Osumi, Takamichi/大墨 昂道 <osumi.takamichi@fujitsu.com>
> I updated my patch to take in those feedbacks !
> Have a look at the latest patch.

Looks good to me (the status remains ready for committer.)


Regards
Takayuki Tsunakawa




Re: Disable WAL logging to speed up data loading

From
David Steele
Date:
On 1/17/21 12:16 AM, tsunakawa.takay@fujitsu.com wrote:
> From: Osumi, Takamichi/大墨 昂道 <osumi.takamichi@fujitsu.com>
>> I updated my patch to take in those feedbacks !
>> Have a look at the latest patch.
> 
> Looks good to me (the status remains ready for committer.)

After reading through the thread (but not reading the patch) I am -1 on  
this proposal.

The feature seems ripe for abuse and misunderstanding, and as has been  
noted in the thread, there are a variety of alternatives that can  
provide a similar effect.

It doesn't help that at several points along the way new WAL records  
have been found that still need to be included even when wal_level =  
none. It's not clear to me how we know when we have found them all.

The patch is marked Ready for Committer but as far as I can see there  
are no committers in favor of it and quite a few who are not.

Perhaps it would be better to look at some of the more targeted  
approaches mentioned in the thread and see if any of them can be  
used/improved to achieve the desired result?

Regards,
-- 
-David
david@pgmasters.net



RE: Disable WAL logging to speed up data loading

From
"tsunakawa.takay@fujitsu.com"
Date:
From: David Steele <david@pgmasters.net>
> After reading through the thread (but not reading the patch) I am -1 on
> this proposal.
>
> The feature seems ripe for abuse and misunderstanding, and as has been
> noted in the thread, there are a variety of alternatives that can
> provide a similar effect.
>
> It doesn't help that at several points along the way new WAL records
> have been found that still need to be included even when wal_level =
> none. It's not clear to me how we know when we have found them all.
>
> The patch is marked Ready for Committer but as far as I can see there
> are no committers in favor of it and quite a few who are not.

I can understand that people are worried about not having WAL.  But as far as I remember, I'm afraid those concerns
wereemotional, not logical, i.e., something like "something may happen.".  Regarding concrete concerns that
Stephen-san,Magnus-san, Horiguchi-san, Sawada-san and others raised, Osumi-san addressed them based on their advice and
review,both in this thread and other threads. 

I also understand we want to value people's emotion for worry-free PostgreSQL.  At the same time, I'd like the emotion
understoodthat we want Postgres to have this convenient, easy-to-use feature.  MySQL recently introduced this feature.
Whycan't Postgres do it? 


> Perhaps it would be better to look at some of the more targeted
> approaches mentioned in the thread and see if any of them can be
> used/improved to achieve the desired result?

Other methods are not as easy-to-use, and more complex to implement.

What kind of destiny does this type of feature end up in?


    Regards
Takayuki Tsunakawa}




Re: Disable WAL logging to speed up data loading

From
Laurenz Albe
Date:
On Fri, 2021-03-19 at 06:53 +0000, tsunakawa.takay@fujitsu.com wrote:
> From: David Steele <david@pgmasters.net>
> > After reading through the thread (but not reading the patch) I am -1 on
> > this proposal.
> > 
> > The feature seems ripe for abuse and misunderstanding, and as has been
> > noted in the thread, there are a variety of alternatives that can
> > provide a similar effect.
> > 
> > It doesn't help that at several points along the way new WAL records
> > have been found that still need to be included even when wal_level =
> > none. It's not clear to me how we know when we have found them all.
> > 
> > The patch is marked Ready for Committer but as far as I can see there
> > are no committers in favor of it and quite a few who are not.
> 
> I can understand that people are worried about not having WAL.  But as far
>  as I remember, I'm afraid those concerns were emotional, not logical,
>  i.e., something like "something may happen.".  Regarding concrete concerns
>  that Stephen-san, Magnus-san, Horiguchi-san, Sawada-san and others raised,
>  Osumi-san addressed them based on their advice and review, both in this
>  thread and other threads.
> 
> I also understand we want to value people's emotion for worry-free PostgreSQL.
> At the same time, I'd like the emotion understood that we want Postgres to
>  have this convenient, easy-to-use feature.  MySQL recently introduced this
>  feature.  Why can't Postgres do it?

Part of what gives PostgreSQL the good name it has is that we strive to
make it very hard to mess up your data.  We are finicky about encoding
correctness, we don't allow you to drop a table used by a view, and so on.

All these things are annoying to users, but we'd rather take that than
the complaints that a database got corrupted because somebody didn't read
the documentation carefully.

Now I'm not saying that this feature should not go in (I set it to
"ready for committer", because I see no technical flaw with the
implementation), but it remains debatable if we want the feature or not.

I certainly can see David's point of view.  And we don't view MySQL as
a role model that we want to emulate.

> > Perhaps it would be better to look at some of the more targeted
> > approaches mentioned in the thread and see if any of them can be
> > used/improved to achieve the desired result?
> 
> Other methods are not as easy-to-use, and more complex to implement.

Fast and loose often takes less effort, yes.

Yours,
Laurenz Albe




RE: Disable WAL logging to speed up data loading

From
"tsunakawa.takay@fujitsu.com"
Date:
From: Laurenz Albe <laurenz.albe@cybertec.at>
> Now I'm not saying that this feature should not go in (I set it to
> "ready for committer", because I see no technical flaw with the
> implementation), but it remains debatable if we want the feature or not.

Oh, yes, thank you very much for supporting this and other relevant two threads!


> I certainly can see David's point of view.  And we don't view MySQL as
> a role model that we want to emulate.

Yes, what MySQL was over ten years ago would not be a role model for us.  OTOH, recent MySQL under Oracle should be
improvingmuch -- adopting InnoDB as a default storage engine, transactional data dictionary, etc.  (Somewhat offtopic,
buttheir documentation quality is great.)
 


> All these things are annoying to users, but we'd rather take that than
> the complaints that a database got corrupted because somebody didn't read
> the documentation carefully.

Hmm, if that were the case, then some people would say the unlogged-table based approach is also be dangerous, saying
"Usersdon't read the manual carefully and easily think that making all tables unlogged is good for performance."
 


    Regards
Takayuki Tsunakawa


Re: Disable WAL logging to speed up data loading

From
Stephen Frost
Date:
Greetings,

* tsunakawa.takay@fujitsu.com (tsunakawa.takay@fujitsu.com) wrote:
> From: David Steele <david@pgmasters.net>
> > After reading through the thread (but not reading the patch) I am -1 on
> > this proposal.
> >
> > The feature seems ripe for abuse and misunderstanding, and as has been
> > noted in the thread, there are a variety of alternatives that can
> > provide a similar effect.
> >
> > It doesn't help that at several points along the way new WAL records
> > have been found that still need to be included even when wal_level =
> > none. It's not clear to me how we know when we have found them all.
> >
> > The patch is marked Ready for Committer but as far as I can see there
> > are no committers in favor of it and quite a few who are not.
>
> I can understand that people are worried about not having WAL.  But as far as I remember, I'm afraid those concerns
wereemotional, not logical, i.e., something like "something may happen.".  Regarding concrete concerns that
Stephen-san,Magnus-san, Horiguchi-san, Sawada-san and others raised, Osumi-san addressed them based on their advice and
review,both in this thread and other threads. 
>
> I also understand we want to value people's emotion for worry-free PostgreSQL.  At the same time, I'd like the
emotionunderstood that we want Postgres to have this convenient, easy-to-use feature.  MySQL recently introduced this
feature. Why can't Postgres do it? 

I don't see there being anything particularly 'emotional' around
suggesting that we already have multiple ways to avoid WAL writes when
populating tables and suggesting that we work to improve those instead
of adding yet another WAL level (note: we've been removing them of late,
for the good reason that having more is confusing and not helpful).

> > Perhaps it would be better to look at some of the more targeted
> > approaches mentioned in the thread and see if any of them can be
> > used/improved to achieve the desired result?
>
> Other methods are not as easy-to-use, and more complex to implement.

The argument here seems to stem from the idea that issueing a 'TRUNCATE'
inside the transaction before starting the 'COPY' command is 'too hard'.
That's clearly a subjective question and perhaps it's my emotional
response to that which is at issue, but I don't think I'm alone in the
feeling that just isn't enough of a justification for adding a new WAL
level.

> What kind of destiny does this type of feature end up in?

I could be wrong and perhaps opinions will change in the future, but it
really doesn't seem like the there's much support for adding a new WAL
level just to avoid doing the TRUNCATE.  Appealing to what other
database systems support can be helpful in some cases but that's usually
when we're looking at a new capability which multiple other systems have
implemented.  This isn't actually a new capability at all- the WAL level
that you're looking for already exists and already gives the
optimization you're looking for, as long as you issue a TRUNCATE at the
start of the transaction.

Thanks,

Stephen

Attachment

RE: Disable WAL logging to speed up data loading

From
"tsunakawa.takay@fujitsu.com"
Date:
From: Stephen Frost <sfrost@snowman.net>
> The argument here seems to stem from the idea that issueing a 'TRUNCATE'
> inside the transaction before starting the 'COPY' command is 'too hard'.


> I could be wrong and perhaps opinions will change in the future, but it really
> doesn't seem like the there's much support for adding a new WAL level just to
> avoid doing the TRUNCATE.  Appealing to what other database systems
> support can be helpful in some cases but that's usually when we're looking at a
> new capability which multiple other systems have implemented.  This isn't
> actually a new capability at all- the WAL level that you're looking for already
> exists and already gives the optimization you're looking for, as long as you issue
> a TRUNCATE at the start of the transaction.

No, we can't ask using TRUNCATE because the user wants to add data to a table.


    Regards
Takayuki     Tsunakawa





Re: Disable WAL logging to speed up data loading

From
Stephen Frost
Date:
Greetings,

* tsunakawa.takay@fujitsu.com (tsunakawa.takay@fujitsu.com) wrote:
> From: Stephen Frost <sfrost@snowman.net>
> > The argument here seems to stem from the idea that issueing a 'TRUNCATE'
> > inside the transaction before starting the 'COPY' command is 'too hard'.
>
> > I could be wrong and perhaps opinions will change in the future, but it really
> > doesn't seem like the there's much support for adding a new WAL level just to
> > avoid doing the TRUNCATE.  Appealing to what other database systems
> > support can be helpful in some cases but that's usually when we're looking at a
> > new capability which multiple other systems have implemented.  This isn't
> > actually a new capability at all- the WAL level that you're looking for already
> > exists and already gives the optimization you're looking for, as long as you issue
> > a TRUNCATE at the start of the transaction.
>
> No, we can't ask using TRUNCATE because the user wants to add data to a table.

First- what are you expecting would actually happen during crash
recovery in this specific case with your proposed new WAL level?

Second, use partitioning, or unlogged tables (with the patch discussed
elsewhere to allow flipping them to logged without writing the entire
thing to WAL).

Thanks,

Stephen

Attachment

Re: Disable WAL logging to speed up data loading

From
Laurenz Albe
Date:
On Mon, 2021-03-22 at 09:46 -0400, Stephen Frost wrote:
> * tsunakawa.takay@fujitsu.com (tsunakawa.takay@fujitsu.com) wrote:
> > From: Stephen Frost <sfrost@snowman.net>
> > > The argument here seems to stem from the idea that issueing a 'TRUNCATE'
> > > inside the transaction before starting the 'COPY' command is 'too hard'.
> > 
> > No, we can't ask using TRUNCATE because the user wants to add data to a table.
> 
> First- what are you expecting would actually happen during crash
> recovery in this specific case with your proposed new WAL level?
> 
> Second, use partitioning, or unlogged tables (with the patch discussed
> elsewhere to allow flipping them to logged without writing the entire
> thing to WAL).

Perhaps allowing to set unlogged tables to logged ones without writing WAL
is a more elegant way to do that, but I cannot see how that would be any
more crash safe than this patch.  Besides, the feature doesn't exist yet.

So I think that argument doesn't carry much weight.

Yours,
Laurenz Albe




Re: Disable WAL logging to speed up data loading

From
Stephen Frost
Date:
Greetings,

* Laurenz Albe (laurenz.albe@cybertec.at) wrote:
> On Mon, 2021-03-22 at 09:46 -0400, Stephen Frost wrote:
> > * tsunakawa.takay@fujitsu.com (tsunakawa.takay@fujitsu.com) wrote:
> > > From: Stephen Frost <sfrost@snowman.net>
> > > > The argument here seems to stem from the idea that issueing a 'TRUNCATE'
> > > > inside the transaction before starting the 'COPY' command is 'too hard'.
> > >
> > > No, we can't ask using TRUNCATE because the user wants to add data to a table.
> >
> > First- what are you expecting would actually happen during crash
> > recovery in this specific case with your proposed new WAL level?
> >
> > Second, use partitioning, or unlogged tables (with the patch discussed
> > elsewhere to allow flipping them to logged without writing the entire
> > thing to WAL).
>
> Perhaps allowing to set unlogged tables to logged ones without writing WAL
> is a more elegant way to do that, but I cannot see how that would be any
> more crash safe than this patch.  Besides, the feature doesn't exist yet.

I'm not suggesting it's somehow more crash safe- but it's at least very
clear what happens in such a case, to wit: the entire table is cleared
on crash recovery.

> So I think that argument doesn't carry much weight.

We're talking about two different ways to accomplish essentially the
same thing- one which introduces a new WAL level, vs. one which adds an
optimization for a WAL level we already have.  That the second is more
elegant is more-or-less entirely the point I'm making here, so it seems
pretty relevant.

Thanks,

Stephen

Attachment

Re: Disable WAL logging to speed up data loading

From
Laurenz Albe
Date:
On Mon, 2021-03-22 at 11:05 -0400, Stephen Frost wrote:
> > Perhaps allowing to set unlogged tables to logged ones without writing WAL
> > is a more elegant way to do that, but I cannot see how that would be any
> > more crash safe than this patch.  Besides, the feature doesn't exist yet.
> 
> I'm not suggesting it's somehow more crash safe- but it's at least very
> clear what happens in such a case, to wit: the entire table is cleared
> on crash recovery.

I didn't look at the patch, but are you saying that after changing the
table to LOGGED, it somehow remembers that it is not crash safe and is
emptied if there is a crash before the next checkpoint?

Wouldn't that cause corruption if you restore from an earlier backup?
At least with the feature in this thread we'd get an error on recovery.

Yours,
Laurenz Albe




Re: Disable WAL logging to speed up data loading

From
Stephen Frost
Date:
Greetings,

* Laurenz Albe (laurenz.albe@cybertec.at) wrote:
> On Mon, 2021-03-22 at 11:05 -0400, Stephen Frost wrote:
> > > Perhaps allowing to set unlogged tables to logged ones without writing WAL
> > > is a more elegant way to do that, but I cannot see how that would be any
> > > more crash safe than this patch.  Besides, the feature doesn't exist yet.
> >
> > I'm not suggesting it's somehow more crash safe- but it's at least very
> > clear what happens in such a case, to wit: the entire table is cleared
> > on crash recovery.
>
> I didn't look at the patch, but are you saying that after changing the
> table to LOGGED, it somehow remembers that it is not crash safe and is
> emptied if there is a crash before the next checkpoint?

I'm not sure where the confusion is, but certainly once the table has
been turned into LOGGED and that's been committed then it should be
crash safe, even under 'minimal' with the optimization to avoid actually
copying the entire table into the WAL.  I don't see any particular
reason why that isn't possible to do.

Under the proposed 'none', you basically have to throw out the entire
cluster on a crash, all because you don't want to use 'UNLOGGED' when
you created the tables you want to load data into, or 'TRUNCATE' them in
the transaction where you start the data load, either of which gives us
enough indication and which we have infrastructure around dealing with
in the event of a crash during the load without everything else having
to be tossed and everything restored from a backup.  That's both a
better user experience from the perspective of having fewer WAL levels
to understand and from just a general administration perspective so you
don't have to go all the way back to a backup to bring the system back
up.

Thanks,

Stephen

Attachment

RE: Disable WAL logging to speed up data loading

From
"tsunakawa.takay@fujitsu.com"
Date:
From: Stephen Frost <sfrost@snowman.net>
> First- what are you expecting would actually happen during crash recovery in
> this specific case with your proposed new WAL level?
...
> I'm not suggesting it's somehow more crash safe- but it's at least very clear
> what happens in such a case, to wit: the entire table is cleared on crash
> recovery.

As Laurenz-san kindly replied, the database server refuses to start with a clear message.  So, it's similarly very
clearwhat happens.  The user will never unknowingly resume operation with possibly corrupt data. 


> We're talking about two different ways to accomplish essentially the same
> thing- one which introduces a new WAL level, vs. one which adds an
> optimization for a WAL level we already have.  That the second is more elegant
> is more-or-less entirely the point I'm making here, so it seems pretty relevant.

So, I understood the point boils down to elegance.  Could I ask what makes you feel ALTER TABLE UNLOGGED/LOGGED is
(more)elegant?  I'm purely asking as a user. 

(I don't want to digress, but if we consider the number of options for wal_level as an issue, I feel it's not elegant
tohave separate "replica" and "logical".) 


> Under the proposed 'none', you basically have to throw out the entire cluster on
> a crash, all because you don't want to use 'UNLOGGED' when you created the
> tables you want to load data into, or 'TRUNCATE' them in the transaction where
> you start the data load, either of which gives us enough indication and which
> we have infrastructure around dealing with in the event of a crash during the
> load without everything else having to be tossed and everything restored from a
> backup.  That's both a better user experience from the perspective of having
> fewer WAL levels to understand and from just a general administration
> perspective so you don't have to go all the way back to a backup to bring the
> system back up.

The elegance of wal_level = none is that the user doesn't have to remember to add ALTER TABLE to the data loading job
whenthey add load target tables/partitions.  If they build and use their own (shell) scripts to load data, that won't
beburdon or forgotten.  But what would they have to do when they use ETL tools like Talend, Pentaho, and Informatica
PowerCenter?  Do those tools allow users to add custom processing like ALTER TABLE to the data loading job steps for
eachtable?  (AFAIK, not.) 

wal_level = none is convenient and attractive for users who can backup and restore the entire database instantly with a
storageor filesystem snapshot feature. 


    Regards
Takayuki     Tsunakawa





Re: Disable WAL logging to speed up data loading

From
Laurenz Albe
Date:
On Mon, 2021-03-22 at 13:22 -0400, Stephen Frost wrote:
> * Laurenz Albe (laurenz.albe@cybertec.at) wrote:
> > On Mon, 2021-03-22 at 11:05 -0400, Stephen Frost wrote:
> > > > Perhaps allowing to set unlogged tables to logged ones without writing WAL
> > > > is a more elegant way to do that, but I cannot see how that would be any
> > > > more crash safe than this patch.  Besides, the feature doesn't exist yet.
> > > 
> > > I'm not suggesting it's somehow more crash safe- but it's at least very
> > > clear what happens in such a case, to wit: the entire table is cleared
> > > on crash recovery.
> > 
> > I didn't look at the patch, but are you saying that after changing the
> > table to LOGGED, it somehow remembers that it is not crash safe and is
> > emptied if there is a crash before the next checkpoint?
> 
> I'm not sure where the confusion is, but certainly once the table has
> been turned into LOGGED and that's been committed then it should be
> crash safe, even under 'minimal' with the optimization to avoid actually
> copying the entire table into the WAL.  I don't see any particular
> reason why that isn't possible to do.

The confusion is probably caused by my ignorance.
If the actual data files of the table are forced out to disk on COMMIT,
that should indeed work.  And if you fsync a file that has not been changed
since it was last persisted, that should be cheap.

So if I got that right, I agree with you that that would be a much better
way to achieve the goal than wal_level = 'none'.

The only drawback is that we don't have that feature yet...

Yours,
Laurenz Albe




Re: Disable WAL logging to speed up data loading

From
Stephen Frost
Date:
Greetings,

* tsunakawa.takay@fujitsu.com (tsunakawa.takay@fujitsu.com) wrote:
> From: Stephen Frost <sfrost@snowman.net>
> > First- what are you expecting would actually happen during crash recovery in
> > this specific case with your proposed new WAL level?
> ...
> > I'm not suggesting it's somehow more crash safe- but it's at least very clear
> > what happens in such a case, to wit: the entire table is cleared on crash
> > recovery.
>
> As Laurenz-san kindly replied, the database server refuses to start with a clear message.  So, it's similarly very
clearwhat happens.  The user will never unknowingly resume operation with possibly corrupt data. 

No, instead they'll just have a completely broken system that has to be
rebuilt or restored from a backup.  That doesn't strike me as a good
result.

> > We're talking about two different ways to accomplish essentially the same
> > thing- one which introduces a new WAL level, vs. one which adds an
> > optimization for a WAL level we already have.  That the second is more elegant
> > is more-or-less entirely the point I'm making here, so it seems pretty relevant.
>
> So, I understood the point boils down to elegance.  Could I ask what makes you feel ALTER TABLE UNLOGGED/LOGGED is
(more)elegant?  I'm purely asking as a user. 

The impact is localized to those specific tables.  The rest of the
system should come up cleanly and there won't be corruption, instead
merely the lack of data in UNLOGGED tables.

> (I don't want to digress, but if we consider the number of options for wal_level as an issue, I feel it's not elegant
tohave separate "replica" and "logical".) 

Do you know of a way to avoid having those two distinct levels while
still writing only the WAL needed depending on if a system is doing
logical replication or not..?  If you've got suggestions on how to
eliminate one of those levels, I'm sure there would be interest in doing
so.  I don't see the fact that we have those two levels as justification
for adding another spelling of 'minimal'.

> > Under the proposed 'none', you basically have to throw out the entire cluster on
> > a crash, all because you don't want to use 'UNLOGGED' when you created the
> > tables you want to load data into, or 'TRUNCATE' them in the transaction where
> > you start the data load, either of which gives us enough indication and which
> > we have infrastructure around dealing with in the event of a crash during the
> > load without everything else having to be tossed and everything restored from a
> > backup.  That's both a better user experience from the perspective of having
> > fewer WAL levels to understand and from just a general administration
> > perspective so you don't have to go all the way back to a backup to bring the
> > system back up.
>
> The elegance of wal_level = none is that the user doesn't have to remember to add ALTER TABLE to the data loading job
whenthey add load target tables/partitions.  If they build and use their own (shell) scripts to load data, that won't
beburdon or forgotten.  But what would they have to do when they use ETL tools like Talend, Pentaho, and Informatica
PowerCenter?  Do those tools allow users to add custom processing like ALTER TABLE to the data loading job steps for
eachtable?  (AFAIK, not.) 

I don't buy the argument that having to 'remember' to do an ALTER TABLE
is such a burden when it means that the database will still be
consistent and operational after a crash.

As for data loading tools, surely they support loading data into
UNLOGGED tables and it's certainly not hard to have a script run around
and flip those tables to LOGGED after they're loaded, and I do actually
believe some of those tools support building processes of which one step
could be such a command (I'm fairly confident Pentaho, in particular,
does as I remember building such pipelines myself...).

Thanks,

Stephen

Attachment

RE: Disable WAL logging to speed up data loading

From
"tsunakawa.takay@fujitsu.com"
Date:
From: Stephen Frost <sfrost@snowman.net>
> * tsunakawa.takay@fujitsu.com (tsunakawa.takay@fujitsu.com) wrote:
> > As Laurenz-san kindly replied, the database server refuses to start with a
> clear message.  So, it's similarly very clear what happens.  The user will never
> unknowingly resume operation with possibly corrupt data.
>
> No, instead they'll just have a completely broken system that has to be rebuilt or
> restored from a backup.  That doesn't strike me as a good result.

Your understanding is correct.  What I wanted to answer to confirm is that the behavior is clear: the server refuses to
start,and the user knows he/she should restore the database from backup. 


> > So, I understood the point boils down to elegance.  Could I ask what makes
> you feel ALTER TABLE UNLOGGED/LOGGED is (more) elegant?  I'm purely
> asking as a user.
>
> The impact is localized to those specific tables.  The rest of the system should
> come up cleanly and there won't be corruption, instead merely the lack of data
> in UNLOGGED tables.

So, I took your point as the ease and fast time of restore after a crash: the user just has to restore the lost table
datausing COPY FROM from files that was saved before the data loading job using COPY TO. 

In that sense, the backup and restoration of the whole database is an option for users when they have some instant
backupand restore feature available. 


> > (I don't want to digress, but if we consider the number of options for
> > wal_level as an issue, I feel it's not elegant to have separate
> > "replica" and "logical".)
>
> Do you know of a way to avoid having those two distinct levels while still writing
> only the WAL needed depending on if a system is doing logical replication or
> not..?  If you've got suggestions on how to eliminate one of those levels, I'm
> sure there would be interest in doing so.  I don't see the fact that we have
> those two levels as justification for adding another spelling of 'minimal'.

Sorry, I have almost no knowledge of logical replication implementation.  So, being ignorant of its intricacies, I have
feltlike as a user "Why do I have to set wal_level = logical, because streaming replication and logical replication are
bothreplication features?  If the implementation needs some additional WAL for logical replication, why doesn't the
serverautomatically emit the WAL when the target table of DML statements is in a publication?" 


> > The elegance of wal_level = none is that the user doesn't have to
> > remember to add ALTER TABLE to the data loading job when they add load
> > target tables/partitions.  If they build and use their own (shell)
> > scripts to load data, that won't be burdon or forgotten.  But what
> > would they have to do when they use ETL tools like Talend, Pentaho,
> > and Informatica Power Center?  Do those tools allow users to add
> > custom processing like ALTER TABLE to the data loading job steps for
> > each table?  (AFAIK, not.)
>
> I don't buy the argument that having to 'remember' to do an ALTER TABLE is
> such a burden when it means that the database will still be consistent and
> operational after a crash.

That depends on whether an instant backup and restore feature is at hand.  If the user is comfortable with it,
wal_level= none is easier and more attractive.  At least, I don't want the feature to be denied. 


> As for data loading tools, surely they support loading data into UNLOGGED
> tables and it's certainly not hard to have a script run around and flip those
> tables to LOGGED after they're loaded, and I do actually believe some of those
> tools support building processes of which one step could be such a command
> (I'm fairly confident Pentaho, in particular, does as I remember building such
> pipelines myself...).

Oh, Pentaho has such a feature, doesn't it?  But isn't it a separate step from the data output step?  Here, I assume
ETLtools allow users to compose a data loading job from multiple steps: data input, transformation, data output, etc.
Iguess the user can't directly incorporate ALTER TABLE into the data output step, and has to add separate custom steps
forALTER TABLE.  That's burdonsome and forgettable, I think. 


    Regards
Takayuki     Tsunakawa





Re: Disable WAL logging to speed up data loading

From
Stephen Frost
Date:
Greetings,

* tsunakawa.takay@fujitsu.com (tsunakawa.takay@fujitsu.com) wrote:
> From: Stephen Frost <sfrost@snowman.net>
> > * tsunakawa.takay@fujitsu.com (tsunakawa.takay@fujitsu.com) wrote:
> > As for data loading tools, surely they support loading data into UNLOGGED
> > tables and it's certainly not hard to have a script run around and flip those
> > tables to LOGGED after they're loaded, and I do actually believe some of those
> > tools support building processes of which one step could be such a command
> > (I'm fairly confident Pentaho, in particular, does as I remember building such
> > pipelines myself...).
>
> Oh, Pentaho has such a feature, doesn't it?  But isn't it a separate step from the data output step?  Here, I assume
ETLtools allow users to compose a data loading job from multiple steps: data input, transformation, data output, etc.
Iguess the user can't directly incorporate ALTER TABLE into the data output step, and has to add separate custom steps
forALTER TABLE.  That's burdonsome and forgettable, I think. 

None of the arguments presented here has done anything to change my
opinion that adding a 'none' WAL level is a bad idea.

Thanks,

Stephen

Attachment

RE: Disable WAL logging to speed up data loading

From
"osumi.takamichi@fujitsu.com"
Date:
Hi


Mainly affected by a commit 9de9294,
I've fixed minor things to rebase the patch.
All modifications I did are cosmetic changes and
a little bit of documentation updates.
Please have a look at attached v09.

Best Regards,
    Takamichi Osumi


Attachment

Re: Disable WAL logging to speed up data loading

From
vignesh C
Date:
On Wed, Apr 7, 2021 at 12:13 PM osumi.takamichi@fujitsu.com
<osumi.takamichi@fujitsu.com> wrote:
>
> Hi
>
>
> Mainly affected by a commit 9de9294,
> I've fixed minor things to rebase the patch.
> All modifications I did are cosmetic changes and
> a little bit of documentation updates.
> Please have a look at attached v09.
>

Patch is not applying on Head, kindly post a rebased version. As this
patch is in Ready for Committer state, it will help one of the
committers to pick up this patch during commit fest.

Regards,
Vignesh



Re: Disable WAL logging to speed up data loading

From
Stephen Frost
Date:
Greetings,

* vignesh C (vignesh21@gmail.com) wrote:
> On Wed, Apr 7, 2021 at 12:13 PM osumi.takamichi@fujitsu.com
> <osumi.takamichi@fujitsu.com> wrote:
> > Mainly affected by a commit 9de9294,
> > I've fixed minor things to rebase the patch.
> > All modifications I did are cosmetic changes and
> > a little bit of documentation updates.
> > Please have a look at attached v09.
>
> Patch is not applying on Head, kindly post a rebased version. As this
> patch is in Ready for Committer state, it will help one of the
> committers to pick up this patch during commit fest.

Unless there's actually a committer who is interested and willing to
take this on (understanding that multiple committers have already said
they don't agree with this approach), I don't think it makes sense to
spend additional time on this.

Rather than RfC, the appropriate status seems like it should be
Rejected, as otherwise it's just encouraging someone to ultimately waste
their time rebasing and updating the patch when it isn't going to ever
actually be committed.

Thanks,

Stephen

Attachment

Re: Disable WAL logging to speed up data loading

From
Michael Paquier
Date:
On Sun, Jul 04, 2021 at 11:02:01AM -0400, Stephen Frost wrote:
> Rather than RfC, the appropriate status seems like it should be
> Rejected, as otherwise it's just encouraging someone to ultimately waste
> their time rebasing and updating the patch when it isn't going to ever
> actually be committed.

Yes, agreed with that.
--
Michael

Attachment

RE: Disable WAL logging to speed up data loading

From
"osumi.takamichi@fujitsu.com"
Date:
On Monday, July 5, 2021 10:32 AM Michael Paquier wrote:
> On Sun, Jul 04, 2021 at 11:02:01AM -0400, Stephen Frost wrote:
> > Rather than RfC, the appropriate status seems like it should be
> > Rejected, as otherwise it's just encouraging someone to ultimately
> > waste their time rebasing and updating the patch when it isn't going
> > to ever actually be committed.
>
> Yes, agreed with that.
Thanks for paying attention to this thread.

This cannot be helped but I've made the patch status as you suggested,
because I judged the community would not accept this idea.

Best Regards,
    Takamichi Osumi




Re: Disable WAL logging to speed up data loading

From
Stephen Frost
Date:
Greetings,

* osumi.takamichi@fujitsu.com (osumi.takamichi@fujitsu.com) wrote:
> On Monday, July 5, 2021 10:32 AM Michael Paquier wrote:
> > On Sun, Jul 04, 2021 at 11:02:01AM -0400, Stephen Frost wrote:
> > > Rather than RfC, the appropriate status seems like it should be
> > > Rejected, as otherwise it's just encouraging someone to ultimately
> > > waste their time rebasing and updating the patch when it isn't going
> > > to ever actually be committed.
> >
> > Yes, agreed with that.
> Thanks for paying attention to this thread.
>
> This cannot be helped but I've made the patch status as you suggested,
> because I judged the community would not accept this idea.

Thanks.  Hopefully this will encourage those interested in minimal WAL
logging for data loading performance to instead work on additional
optimizations for the existing 'minimal' WAL level.

Thanks again,

Stephen

Attachment