v13: CLUSTER segv with wal_level=minimal and parallel index creation - Mailing list pgsql-hackers

From Justin Pryzby
Subject v13: CLUSTER segv with wal_level=minimal and parallel index creation
Date
Msg-id 20200907023737.GA7158@telsasoft.com
Whole thread Raw
Responses Re: v13: CLUSTER segv with wal_level=minimal and parallel index creation  (Justin Pryzby <pryzby@telsasoft.com>)
List pgsql-hackers
Following a bulk load, a CLUSTER command run by a maintenance script crashed.
This is currently reproducible on that instance, so please suggest if I can
provide more info.

< 2020-09-06 15:44:16.369 MDT  >LOG:  background worker "parallel worker" (PID 2576) was terminated by signal 6:
Aborted
< 2020-09-06 15:44:16.369 MDT  >DETAIL:  Failed process was running: CLUSTER pg_attribute USING
pg_attribute_relid_attnam_index

The crash happens during:
ts=# REINDEX INDEX pg_attribute_relid_attnum_index;
..but not:
ts=# REINDEX INDEX pg_attribute_relid_attnam_index ;

 pg_catalog | pg_attribute_relid_attnam_index | index | postgres | pg_attribute | permanent   | 31 MB | 
 pg_catalog | pg_attribute_relid_attnum_index | index | postgres | pg_attribute | permanent   | 35 MB | 

I suspect
|commit c6b92041d Skip WAL for new relfilenodes, under wal_level=minimal.

In fact, I set wal_level=minimal for the bulk load.  Note also:
 override             | data_checksums                  | on
 configuration file   | checkpoint_timeout              | 60
 configuration file   | maintenance_work_mem            | 1048576
 configuration file   | max_wal_senders                 | 0
 configuration file   | wal_compression                 | on
 configuration file   | wal_level                       | minimal
 configuration file   | fsync                           | off
 configuration file   | full_page_writes                | off
 default              | server_version                  | 13beta3

(gdb) bt
#0  0x00007ff9999ad387 in raise () from /lib64/libc.so.6
#1  0x00007ff9999aea78 in abort () from /lib64/libc.so.6
#2  0x0000000000921da5 in ExceptionalCondition (conditionName=conditionName@entry=0xad4078 "relcache_verdict ==
RelFileNodeSkippingWAL(relation->rd_node)",errorType=errorType@entry=0x977f49 "FailedAssertion", 
 
    fileName=fileName@entry=0xad3068 "relcache.c", lineNumber=lineNumber@entry=2976) at assert.c:67
#3  0x000000000091a08b in AssertPendingSyncConsistency (relation=0x7ff99c2a70b8) at relcache.c:2976
#4  AssertPendingSyncs_RelationCache () at relcache.c:3036
#5  0x000000000058e591 in smgrDoPendingSyncs (isCommit=isCommit@entry=true,
isParallelWorker=isParallelWorker@entry=true)at storage.c:685
 
#6  0x000000000053b1a4 in CommitTransaction () at xact.c:2118
#7  0x000000000053b826 in EndParallelWorkerTransaction () at xact.c:5300
#8  0x000000000052fcf7 in ParallelWorkerMain (main_arg=<optimized out>) at parallel.c:1479
#9  0x000000000076047a in StartBackgroundWorker () at bgworker.c:813
#10 0x000000000076d88d in do_start_bgworker (rw=0x23ac110) at postmaster.c:5865
#11 maybe_start_bgworkers () at postmaster.c:6091
#12 0x000000000076e43e in sigusr1_handler (postgres_signal_arg=<optimized out>) at postmaster.c:5260
#13 <signal handler called>
#14 0x00007ff999a6c983 in __select_nocancel () from /lib64/libc.so.6
#15 0x00000000004887bc in ServerLoop () at postmaster.c:1691
#16 0x000000000076fb45 in PostmasterMain (argc=argc@entry=3, argv=argv@entry=0x237d280) at postmaster.c:1400
#17 0x000000000048a83d in main (argc=3, argv=0x237d280) at main.c:210

(gdb) bt f
...
#4  AssertPendingSyncs_RelationCache () at relcache.c:3036
        status = {hashp = 0x23cba50, curBucket = 449, curEntry = 0x0}
        locallock = <optimized out>
        rels = 0x23ff018
        maxrels = <optimized out>
        nrels = 0
        idhentry = <optimized out>
        i = <optimized out>
#5  0x000000000058e591 in smgrDoPendingSyncs (isCommit=isCommit@entry=true,
isParallelWorker=isParallelWorker@entry=true)at storage.c:685
 
        pending = <optimized out>
        nrels = 0
        maxrels = 0
        srels = 0x0
        scan = {hashp = 0x23edf60, curBucket = 9633000, curEntry = 0xe01600 <TopTransactionStateData>}
        pendingsync = <optimized out>
#6  0x000000000053b1a4 in CommitTransaction () at xact.c:2118
        s = 0xe01600 <TopTransactionStateData>
        latestXid = <optimized out>
        is_parallel_worker = true
        __func__ = "CommitTransaction"




pgsql-hackers by date:

Previous
From: Michael Paquier
Date:
Subject: Re: Transactions involving multiple postgres foreign servers, take 2
Next
From: Michael Paquier
Date:
Subject: Re: vacuum verbose: show pages marked allvisible/frozen/hintbits