Thread: v13: CLUSTER segv with wal_level=minimal and parallel index creation

v13: CLUSTER segv with wal_level=minimal and parallel index creation

From

Justin Pryzby

Date:

07 September 2020, 02:37:37

Following a bulk load, a CLUSTER command run by a maintenance script crashed.
This is currently reproducible on that instance, so please suggest if I can
provide more info.

< 2020-09-06 15:44:16.369 MDT  >LOG:  background worker "parallel worker" (PID 2576) was terminated by signal 6:
Aborted
< 2020-09-06 15:44:16.369 MDT  >DETAIL:  Failed process was running: CLUSTER pg_attribute USING
pg_attribute_relid_attnam_index

The crash happens during:
ts=# REINDEX INDEX pg_attribute_relid_attnum_index;
..but not:
ts=# REINDEX INDEX pg_attribute_relid_attnam_index ;

 pg_catalog | pg_attribute_relid_attnam_index | index | postgres | pg_attribute | permanent   | 31 MB | 
 pg_catalog | pg_attribute_relid_attnum_index | index | postgres | pg_attribute | permanent   | 35 MB | 

I suspect
|commit c6b92041d Skip WAL for new relfilenodes, under wal_level=minimal.

In fact, I set wal_level=minimal for the bulk load.  Note also:
 override             | data_checksums                  | on
 configuration file   | checkpoint_timeout              | 60
 configuration file   | maintenance_work_mem            | 1048576
 configuration file   | max_wal_senders                 | 0
 configuration file   | wal_compression                 | on
 configuration file   | wal_level                       | minimal
 configuration file   | fsync                           | off
 configuration file   | full_page_writes                | off
 default              | server_version                  | 13beta3

(gdb) bt
#0  0x00007ff9999ad387 in raise () from /lib64/libc.so.6
#1  0x00007ff9999aea78 in abort () from /lib64/libc.so.6
#2  0x0000000000921da5 in ExceptionalCondition (conditionName=conditionName@entry=0xad4078 "relcache_verdict ==
RelFileNodeSkippingWAL(relation->rd_node)",errorType=errorType@entry=0x977f49 "FailedAssertion", 
 
    fileName=fileName@entry=0xad3068 "relcache.c", lineNumber=lineNumber@entry=2976) at assert.c:67
#3  0x000000000091a08b in AssertPendingSyncConsistency (relation=0x7ff99c2a70b8) at relcache.c:2976
#4  AssertPendingSyncs_RelationCache () at relcache.c:3036
#5  0x000000000058e591 in smgrDoPendingSyncs (isCommit=isCommit@entry=true,
isParallelWorker=isParallelWorker@entry=true)at storage.c:685
 
#6  0x000000000053b1a4 in CommitTransaction () at xact.c:2118
#7  0x000000000053b826 in EndParallelWorkerTransaction () at xact.c:5300
#8  0x000000000052fcf7 in ParallelWorkerMain (main_arg=<optimized out>) at parallel.c:1479
#9  0x000000000076047a in StartBackgroundWorker () at bgworker.c:813
#10 0x000000000076d88d in do_start_bgworker (rw=0x23ac110) at postmaster.c:5865
#11 maybe_start_bgworkers () at postmaster.c:6091
#12 0x000000000076e43e in sigusr1_handler (postgres_signal_arg=<optimized out>) at postmaster.c:5260
#13 <signal handler called>
#14 0x00007ff999a6c983 in __select_nocancel () from /lib64/libc.so.6
#15 0x00000000004887bc in ServerLoop () at postmaster.c:1691
#16 0x000000000076fb45 in PostmasterMain (argc=argc@entry=3, argv=argv@entry=0x237d280) at postmaster.c:1400
#17 0x000000000048a83d in main (argc=3, argv=0x237d280) at main.c:210

(gdb) bt f
...
#4  AssertPendingSyncs_RelationCache () at relcache.c:3036
        status = {hashp = 0x23cba50, curBucket = 449, curEntry = 0x0}
        locallock = <optimized out>
        rels = 0x23ff018
        maxrels = <optimized out>
        nrels = 0
        idhentry = <optimized out>
        i = <optimized out>
#5  0x000000000058e591 in smgrDoPendingSyncs (isCommit=isCommit@entry=true,
isParallelWorker=isParallelWorker@entry=true)at storage.c:685
 
        pending = <optimized out>
        nrels = 0
        maxrels = 0
        srels = 0x0
        scan = {hashp = 0x23edf60, curBucket = 9633000, curEntry = 0xe01600 <TopTransactionStateData>}
        pendingsync = <optimized out>
#6  0x000000000053b1a4 in CommitTransaction () at xact.c:2118
        s = 0xe01600 <TopTransactionStateData>
        latestXid = <optimized out>
        is_parallel_worker = true
        __func__ = "CommitTransaction"

Re: v13: CLUSTER segv with wal_level=minimal and parallel index creation

From

Justin Pryzby

Date:

07 September 2020, 03:34:51

This is easily reproduced, at least on pg_attribute

[pryzbyj@localhost ~]$ /usr/pgsql-13/bin/initdb -D pgsql13.dat
[pryzbyj@localhost ~]$ /usr/pgsql-13/bin/postgres -D pgsql13.dat -c logging_collector=off -c port=5678 -c
unix_socket_directories=/tmp-c wal-level=minimal -c max_wal_senders=0&
 
[pryzbyj@localhost ~]$ psql -h /tmp -p 5678 postgres -c "SET min_parallel_table_scan_size=0" -c "SET
client_min_messages=debug"-c "REINDEX INDEX pg_attribute_relid_attnum_index"
 
SET
SET
DEBUG:  building index "pg_attribute_relid_attnum_index" on table "pg_attribute" with request for 1 parallel worker
TRAP: FailedAssertion("relcache_verdict == RelFileNodeSkippingWAL(relation->rd_node)", File: "relcache.c", Line: 2976)
postgres: parallel worker for PID 9637 (ExceptionalCondition+0x66)[0x921d86]
postgres: parallel worker for PID 9637 (AssertPendingSyncs_RelationCache+0x1db)[0x91a08b]
postgres: parallel worker for PID 9637 (smgrDoPendingSyncs+0x71)[0x58e591]
postgres: parallel worker for PID 9637 [0x53b1a4]
postgres: parallel worker for PID 9637 (EndParallelWorkerTransaction+0x16)[0x53b826]
postgres: parallel worker for PID 9637 (ParallelWorkerMain+0x437)[0x52fcf7]
postgres: parallel worker for PID 9637 (StartBackgroundWorker+0x25a)[0x76047a]
postgres: parallel worker for PID 9637 [0x76d88d]
postgres: parallel worker for PID 9637 [0x76e43e]
/lib64/libpthread.so.0(+0xf630)[0x7f46d4b47630]
/lib64/libc.so.6(__select+0x13)[0x7f46d26ba983]
postgres: parallel worker for PID 9637 [0x4887bc]
postgres: parallel worker for PID 9637 (PostmasterMain+0x1165)[0x76fb45]
postgres: parallel worker for PID 9637 (main+0x70d)[0x48a83d]
/lib64/libc.so.6(__libc_start_main+0xf5)[0x7f46d25e7555]

-- 
Justin

Re: v13: CLUSTER segv with wal_level=minimal and parallel index creation

From

Tom Lane

Date:

07 September 2020, 04:33:03

Justin Pryzby <pryzby@telsasoft.com> writes:
> This is easily reproduced, at least on pg_attribute

Yeah, duplicated here.  The assertion (not SIGSEGV) is in code
added by c6b92041d, so it seems clearly an open issue for v13.

I did not check to see what happens in a non-assert build.

            regards, tom lane

Re: v13: CLUSTER segv with wal_level=minimal and parallel index creation

From

Kyotaro Horiguchi

Date:

07 September 2020, 04:45:28

At Mon, 07 Sep 2020 00:33:03 -0400, Tom Lane <tgl@sss.pgh.pa.us> wrote in 
> Justin Pryzby <pryzby@telsasoft.com> writes:
> > This is easily reproduced, at least on pg_attribute

Thanks for the repro.

> Yeah, duplicated here.  The assertion (not SIGSEGV) is in code
> added by c6b92041d, so it seems clearly an open issue for v13.
> 
> I did not check to see what happens in a non-assert build.

The cause is that the worker had received pending-sync entry correctly
but not never created a relcache entry for the relation using
RelationBuildDesc. So the rd_firstRelfilenodeSubid is not correctly
set.

I'm investigating it.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Re: v13: CLUSTER segv with wal_level=minimal and parallel index creation

From

Kyotaro Horiguchi

Date:

07 September 2020, 08:40:36

At Mon, 07 Sep 2020 13:45:28 +0900 (JST), Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote in 
> The cause is that the worker had received pending-sync entry correctly
> but not never created a relcache entry for the relation using
> RelationBuildDesc. So the rd_firstRelfilenodeSubid is not correctly
> set.
> 
> I'm investigating it.

Relcaches are loaded from a file with old content at parallel worker
startup. The relcache entry is corrected by invalidation at taking a
lock but pending syncs are not considered.

Since parallel workers don't access the files so we can just ignore
the assertion safely, but I want to rd_firstRelfilenodeSubid flag at
invalidation, as attached PoC patch.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
diff --git a/src/backend/utils/cache/relcache.c b/src/backend/utils/cache/relcache.c
index 96ecad02dd..d66e914345 100644
--- a/src/backend/utils/cache/relcache.c
+++ b/src/backend/utils/cache/relcache.c
@@ -1322,10 +1322,27 @@ RelationInitPhysicalAddr(Relation relation)
     }
     else
     {
+        Oid oldnode = relation->rd_node.relNode;
+
         /* Consult the relation mapper */
         relation->rd_node.relNode =
             RelationMapOidToFilenode(relation->rd_id,
                                      relation->rd_rel->relisshared);
+
+        /*
+         * The leader process might have replaced storage before staring this
+         * worker. We need to set rd_firstRelfilenodeSubid according to pending
+         * sync hash in that case. Currently the information is not actually
+         * used in worker processes, but make things tidy.
+         */
+        if (IsParallelWorker() && oldnode != relation->rd_node.relNode)
+        {
+            if (RelFileNodeSkippingWAL(relation->rd_node))
+                relation->rd_firstRelfilenodeSubid = TopSubTransactionId;
+            else
+                relation->rd_firstRelfilenodeSubid = InvalidTransactionId;
+        }
+
         if (!OidIsValid(relation->rd_node.relNode))
             elog(ERROR, "could not find relation mapping for relation \"%s\", OID %u",
                  RelationGetRelationName(relation), relation->rd_id);

Re: v13: CLUSTER segv with wal_level=minimal and parallel index creation

From

Noah Misch

Date:

07 September 2020, 09:32:55

On Mon, Sep 07, 2020 at 05:40:36PM +0900, Kyotaro Horiguchi wrote:
> At Mon, 07 Sep 2020 13:45:28 +0900 (JST), Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote in 
> > The cause is that the worker had received pending-sync entry correctly
> > but not never created a relcache entry for the relation using
> > RelationBuildDesc. So the rd_firstRelfilenodeSubid is not correctly
> > set.
> > 
> > I'm investigating it.
> 
> Relcaches are loaded from a file with old content at parallel worker
> startup. The relcache entry is corrected by invalidation at taking a
> lock but pending syncs are not considered.
> 
> Since parallel workers don't access the files so we can just ignore
> the assertion safely, but I want to rd_firstRelfilenodeSubid flag at
> invalidation, as attached PoC patch.

> [patch: When RelationInitPhysicalAddr() handles a mapped relation, re-fill
> rd_firstRelfilenodeSubid from RelFileNodeSkippingWAL(), like
> RelationBuildDesc() would do.]

As a PoC, this looks promising.  Thanks.  Would you add a test case such that
the following demonstrates the bug in the absence of your PoC?

  printf '%s\n%s\n%s\n' 'log_statement = all' 'wal_level = minimal' 'max_wal_senders = 0' >/tmp/minimal.conf
  make check TEMP_CONFIG=/tmp/minimal.conf

Please have the test try both a nailed-and-mapped relation and a "nailed, but
not mapped" relation.  I am fairly confident that your PoC fixes the former
case, but the latter may need additional code.

Re: v13: CLUSTER segv with wal_level=minimal and parallel index creation

From

Kyotaro Horiguchi

Date:

08 September 2020, 00:13:53

At Mon, 7 Sep 2020 02:32:55 -0700, Noah Misch <noah@leadboat.com> wrote in 
> On Mon, Sep 07, 2020 at 05:40:36PM +0900, Kyotaro Horiguchi wrote:
> > At Mon, 07 Sep 2020 13:45:28 +0900 (JST), Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote in 
> > > The cause is that the worker had received pending-sync entry correctly
> > > but not never created a relcache entry for the relation using
> > > RelationBuildDesc. So the rd_firstRelfilenodeSubid is not correctly
> > > set.
> > > 
> > > I'm investigating it.
> > 
> > Relcaches are loaded from a file with old content at parallel worker
> > startup. The relcache entry is corrected by invalidation at taking a
> > lock but pending syncs are not considered.
> > 
> > Since parallel workers don't access the files so we can just ignore
> > the assertion safely, but I want to rd_firstRelfilenodeSubid flag at
> > invalidation, as attached PoC patch.
> 
> > [patch: When RelationInitPhysicalAddr() handles a mapped relation, re-fill
> > rd_firstRelfilenodeSubid from RelFileNodeSkippingWAL(), like
> > RelationBuildDesc() would do.]
> 
> As a PoC, this looks promising.  Thanks.  Would you add a test case such that
> the following demonstrates the bug in the absence of your PoC?
> 
>   printf '%s\n%s\n%s\n' 'log_statement = all' 'wal_level = minimal' 'max_wal_senders = 0' >/tmp/minimal.conf
>   make check TEMP_CONFIG=/tmp/minimal.conf

Mmm. I was close to add some tests to 018_wal_optimize.pl but your
suggestion seems better.  I added several ines to create_index.sql.

> Please have the test try both a nailed-and-mapped relation and a "nailed, but
> not mapped" relation.  I am fairly confident that your PoC fixes the former
> case, but the latter may need additional code.

Mmm. You're right. I choosed pg_amproc_fam_proc_index as
nailed-but-not-mapped index.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
From c4d53acd64011f1ed5652177923d010f4c50fa89 Mon Sep 17 00:00:00 2001
From: Kyotaro Horiguchi <horikyoga.ntt@gmail.com>
Date: Mon, 7 Sep 2020 20:30:30 +0900
Subject: [PATCH] Fix assertion failure during reindex while wal_level=minimal.

When wal_level=minimal, nailed relations are forgotten to set
rd_firstRelfilenodeSubid correctly on parallel workers. Fix it.
---
 src/backend/utils/cache/relcache.c         | 16 ++++++++++++++++
 src/test/regress/expected/create_index.out |  5 +++++
 src/test/regress/sql/create_index.sql      |  6 ++++++
 3 files changed, 27 insertions(+)

diff --git a/src/backend/utils/cache/relcache.c b/src/backend/utils/cache/relcache.c
index 96ecad02dd..cd6c8bbc70 100644
--- a/src/backend/utils/cache/relcache.c
+++ b/src/backend/utils/cache/relcache.c
@@ -1273,6 +1273,8 @@ RelationBuildDesc(Oid targetRelId, bool insertIt)
 static void
 RelationInitPhysicalAddr(Relation relation)
 {
+    Oid oldnode = relation->rd_node.relNode;
+
     /* these relations kinds never have storage */
     if (!RELKIND_HAS_STORAGE(relation->rd_rel->relkind))
         return;
@@ -1330,6 +1332,20 @@ RelationInitPhysicalAddr(Relation relation)
             elog(ERROR, "could not find relation mapping for relation \"%s\", OID %u",
                  RelationGetRelationName(relation), relation->rd_id);
     }
+
+    /*
+     * Our leader process might have replaced storage before staring this
+     * worker. We need to set rd_firstRelfilenodeSubid according to pending
+     * sync hash in that case. Currently the information is not actually used
+     * in worker processes, but make things tidy.
+     */
+    if (IsParallelWorker() && oldnode != relation->rd_node.relNode)
+    {
+        if (RelFileNodeSkippingWAL(relation->rd_node))
+            relation->rd_firstRelfilenodeSubid = TopSubTransactionId;
+        else
+            relation->rd_firstRelfilenodeSubid = InvalidTransactionId;
+    }
 }
 
 /*
diff --git a/src/test/regress/expected/create_index.out b/src/test/regress/expected/create_index.out
index 814416d936..6468418639 100644
--- a/src/test/regress/expected/create_index.out
+++ b/src/test/regress/expected/create_index.out
@@ -2598,6 +2598,11 @@ REINDEX TABLE pg_toast.pg_toast_1260;
 ERROR:  must be owner of table pg_toast_1260
 REINDEX INDEX pg_toast.pg_toast_1260_index;
 ERROR:  must be owner of index pg_toast_1260_index
+-- parallel reindex
+RESET ROLE;
+SET min_parallel_table_scan_size = 0;
+REINDEX INDEX pg_attribute_relid_attnum_index;    -- nailed and mapped
+REINDEX INDEX pg_amproc_fam_proc_index;         -- nailed but not mapped
 -- Clean up
 RESET ROLE;
 REVOKE USAGE ON SCHEMA pg_toast FROM regress_reindexuser;
diff --git a/src/test/regress/sql/create_index.sql b/src/test/regress/sql/create_index.sql
index f3667bacdc..d9cb3f14f6 100644
--- a/src/test/regress/sql/create_index.sql
+++ b/src/test/regress/sql/create_index.sql
@@ -1115,6 +1115,12 @@ SET SESSION ROLE regress_reindexuser;
 REINDEX TABLE pg_toast.pg_toast_1260;
 REINDEX INDEX pg_toast.pg_toast_1260_index;
 
+-- parallel reindex
+RESET ROLE;
+SET min_parallel_table_scan_size = 0;
+REINDEX INDEX pg_attribute_relid_attnum_index;    -- nailed and mapped
+REINDEX INDEX pg_amproc_fam_proc_index;         -- nailed but not mapped
+
 -- Clean up
 RESET ROLE;
 REVOKE USAGE ON SCHEMA pg_toast FROM regress_reindexuser;
-- 
2.18.4

Re: v13: CLUSTER segv with wal_level=minimal and parallel index creation

From

Kyotaro Horiguchi

Date:

08 September 2020, 01:43:32

At Tue, 08 Sep 2020 09:13:53 +0900 (JST), Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote in 
> At Mon, 7 Sep 2020 02:32:55 -0700, Noah Misch <noah@leadboat.com> wrote in 
> > As a PoC, this looks promising.  Thanks.  Would you add a test case such that
> > the following demonstrates the bug in the absence of your PoC?
> > 
> >   printf '%s\n%s\n%s\n' 'log_statement = all' 'wal_level = minimal' 'max_wal_senders = 0' >/tmp/minimal.conf
> >   make check TEMP_CONFIG=/tmp/minimal.conf
> 
> Mmm. I was close to add some tests to 018_wal_optimize.pl but your
> suggestion seems better.  I added several ines to create_index.sql.
> 
> > Please have the test try both a nailed-and-mapped relation and a "nailed, but
> > not mapped" relation.  I am fairly confident that your PoC fixes the former
> > case, but the latter may need additional code.
> 
> Mmm. You're right. I choosed pg_amproc_fam_proc_index as
> nailed-but-not-mapped index.

I fixed a typo (s/staring/starting/).

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
From 44512f8672395dc40a001eb92b7ffca060eb411d Mon Sep 17 00:00:00 2001
From: Kyotaro Horiguchi <horikyoga.ntt@gmail.com>
Date: Mon, 7 Sep 2020 20:30:30 +0900
Subject: [PATCH v2] Fix assertion failure during reindex while
 wal_level=minimal.

When wal_level=minimal, nailed relations are forgotten to set
rd_firstRelfilenodeSubid correctly on parallel workers. Fix it.
---
 src/backend/utils/cache/relcache.c         | 16 ++++++++++++++++
 src/test/regress/expected/create_index.out |  5 +++++
 src/test/regress/sql/create_index.sql      |  6 ++++++
 3 files changed, 27 insertions(+)

diff --git a/src/backend/utils/cache/relcache.c b/src/backend/utils/cache/relcache.c
index 96ecad02dd..8a61fd674e 100644
--- a/src/backend/utils/cache/relcache.c
+++ b/src/backend/utils/cache/relcache.c
@@ -1273,6 +1273,8 @@ RelationBuildDesc(Oid targetRelId, bool insertIt)
 static void
 RelationInitPhysicalAddr(Relation relation)
 {
+    Oid oldnode = relation->rd_node.relNode;
+
     /* these relations kinds never have storage */
     if (!RELKIND_HAS_STORAGE(relation->rd_rel->relkind))
         return;
@@ -1330,6 +1332,20 @@ RelationInitPhysicalAddr(Relation relation)
             elog(ERROR, "could not find relation mapping for relation \"%s\", OID %u",
                  RelationGetRelationName(relation), relation->rd_id);
     }
+
+    /*
+     * Our leader process might have replaced storage before starting this
+     * worker. We need to set rd_firstRelfilenodeSubid according to pending
+     * sync hash in that case. Currently the information is not actually used
+     * in worker processes, but make things tidy.
+     */
+    if (IsParallelWorker() && oldnode != relation->rd_node.relNode)
+    {
+        if (RelFileNodeSkippingWAL(relation->rd_node))
+            relation->rd_firstRelfilenodeSubid = TopSubTransactionId;
+        else
+            relation->rd_firstRelfilenodeSubid = InvalidTransactionId;
+    }
 }
 
 /*
diff --git a/src/test/regress/expected/create_index.out b/src/test/regress/expected/create_index.out
index 64c0c66859..66ec1465e1 100644
--- a/src/test/regress/expected/create_index.out
+++ b/src/test/regress/expected/create_index.out
@@ -2732,6 +2732,11 @@ REINDEX TABLE pg_toast.pg_toast_1260;
 ERROR:  must be owner of table pg_toast_1260
 REINDEX INDEX pg_toast.pg_toast_1260_index;
 ERROR:  must be owner of index pg_toast_1260_index
+-- parallel reindex
+RESET ROLE;
+SET min_parallel_table_scan_size = 0;
+REINDEX INDEX pg_attribute_relid_attnum_index;    -- nailed and mapped
+REINDEX INDEX pg_amproc_fam_proc_index;         -- nailed but not mapped
 -- Clean up
 RESET ROLE;
 REVOKE USAGE ON SCHEMA pg_toast FROM regress_reindexuser;
diff --git a/src/test/regress/sql/create_index.sql b/src/test/regress/sql/create_index.sql
index 37f7259da9..6b104a2389 100644
--- a/src/test/regress/sql/create_index.sql
+++ b/src/test/regress/sql/create_index.sql
@@ -1191,6 +1191,12 @@ SET SESSION ROLE regress_reindexuser;
 REINDEX TABLE pg_toast.pg_toast_1260;
 REINDEX INDEX pg_toast.pg_toast_1260_index;
 
+-- parallel reindex
+RESET ROLE;
+SET min_parallel_table_scan_size = 0;
+REINDEX INDEX pg_attribute_relid_attnum_index;    -- nailed and mapped
+REINDEX INDEX pg_amproc_fam_proc_index;         -- nailed but not mapped
+
 -- Clean up
 RESET ROLE;
 REVOKE USAGE ON SCHEMA pg_toast FROM regress_reindexuser;
-- 
2.18.4

Re: v13: CLUSTER segv with wal_level=minimal and parallel index creation

From

Noah Misch

Date:

08 September 2020, 02:47:09

On Tue, Sep 08, 2020 at 10:43:32AM +0900, Kyotaro Horiguchi wrote:
> At Tue, 08 Sep 2020 09:13:53 +0900 (JST), Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote in 
> > At Mon, 7 Sep 2020 02:32:55 -0700, Noah Misch <noah@leadboat.com> wrote in 
> > > As a PoC, this looks promising.  Thanks.  Would you add a test case such that
> > > the following demonstrates the bug in the absence of your PoC?
> > > 
> > >   printf '%s\n%s\n%s\n' 'log_statement = all' 'wal_level = minimal' 'max_wal_senders = 0' >/tmp/minimal.conf
> > >   make check TEMP_CONFIG=/tmp/minimal.conf
> > 
> > Mmm. I was close to add some tests to 018_wal_optimize.pl but your
> > suggestion seems better.  I added several ines to create_index.sql.
> > 
> > > Please have the test try both a nailed-and-mapped relation and a "nailed, but
> > > not mapped" relation.  I am fairly confident that your PoC fixes the former
> > > case, but the latter may need additional code.
> > 
> > Mmm. You're right. I choosed pg_amproc_fam_proc_index as
> > nailed-but-not-mapped index.
> 
> I fixed a typo (s/staring/starting/).

At a glance, this looks reasonable.  If a closer look doesn't reveal problems,
I'll push this.

Re: v13: CLUSTER segv with wal_level=minimal and parallel index creation

From

Noah Misch

Date:

09 September 2020, 04:05:43

On Mon, Sep 07, 2020 at 07:47:09PM -0700, Noah Misch wrote:
> On Tue, Sep 08, 2020 at 10:43:32AM +0900, Kyotaro Horiguchi wrote:
> > At Tue, 08 Sep 2020 09:13:53 +0900 (JST), Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote in 
> > > At Mon, 7 Sep 2020 02:32:55 -0700, Noah Misch <noah@leadboat.com> wrote in 
> > > > As a PoC, this looks promising.  Thanks.  Would you add a test case such that
> > > > the following demonstrates the bug in the absence of your PoC?
> > > > 
> > > >   printf '%s\n%s\n%s\n' 'log_statement = all' 'wal_level = minimal' 'max_wal_senders = 0' >/tmp/minimal.conf
> > > >   make check TEMP_CONFIG=/tmp/minimal.conf
> > > 
> > > Mmm. I was close to add some tests to 018_wal_optimize.pl but your
> > > suggestion seems better.  I added several ines to create_index.sql.

After looking closer, I've moved the test to reindex_catalog.sql; see that
file's header comment for the reasons.  One now needs this command:

  printf '%s\n%s\n%s\n' 'log_statement = all' 'wal_level = minimal' 'max_wal_senders = 0' >/tmp/minimal.conf
  make check-tests TESTS=reindex_catalog TEMP_CONFIG=/tmp/minimal.conf

> > > > Please have the test try both a nailed-and-mapped relation and a "nailed, but
> > > > not mapped" relation.  I am fairly confident that your PoC fixes the former
> > > > case, but the latter may need additional code.
> > > 
> > > Mmm. You're right. I choosed pg_amproc_fam_proc_index as
> > > nailed-but-not-mapped index.
> > 
> > I fixed a typo (s/staring/starting/).
> 
> At a glance, this looks reasonable.  If a closer look doesn't reveal problems,
> I'll push this.

RelationBuildDesc() calls RelationInitPhysicalAddr(), so RelationBuildDesc()
can stop calling RelFileNodeSkippingWAL().  The attached version makes it so,
and I plan to push it.

Attachment

skip-wal-nailed-v3.patch

Re: v13: CLUSTER segv with wal_level=minimal and parallel index creation

From

Kyotaro Horiguchi

Date:

09 September 2020, 05:40:37

Thank you for the checking.

At Tue, 8 Sep 2020 21:05:43 -0700, Noah Misch <noah@leadboat.com> wrote in 
> On Mon, Sep 07, 2020 at 07:47:09PM -0700, Noah Misch wrote:
> After looking closer, I've moved the test to reindex_catalog.sql; see that
> file's header comment for the reasons.  One now needs this command:
> 
>   printf '%s\n%s\n%s\n' 'log_statement = all' 'wal_level = minimal' 'max_wal_senders = 0' >/tmp/minimal.conf
>   make check-tests TESTS=reindex_catalog TEMP_CONFIG=/tmp/minimal.conf

I missed the file. It is surely the best place.

> RelationBuildDesc() calls RelationInitPhysicalAddr(), so RelationBuildDesc()
> can stop calling RelFileNodeSkippingWAL().  The attached version makes it so,
> and I plan to push it.

Sure. relNode is filled with zeros so RelationInitPhysicalAddr() works
correctly for RelationBuildDesc().

Both changes look good to me.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Re: v13: CLUSTER segv with wal_level=minimal and parallel index creation

From

Kyotaro Horiguchi

Date:

10 September 2020, 09:58:59

At Wed, 09 Sep 2020 14:40:37 +0900 (JST), Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote in 
> Sure. relNode is filled with zeros so RelationInitPhysicalAddr() works
> correctly for RelationBuildDesc().
> 
> Both changes look good to me.

Thank you for committing this!

regards.
-- 
Kyotaro Horiguchi
NTT Open Source Software Center