Re: pg15b3: crash in paralell vacuum - Mailing list pgsql-hackers

From Justin Pryzby
Subject Re: pg15b3: crash in paralell vacuum
Date
Msg-id 20220818140415.GN26426@telsasoft.com
Whole thread Raw
In response to pg15b3: crash in paralell vacuum  (Justin Pryzby <pryzby@telsasoft.com>)
Responses Re: pg15b3: crash in paralell vacuum
List pgsql-hackers
On Thu, Aug 18, 2022 at 08:34:06AM -0500, Justin Pryzby wrote:
> Unfortunately, it looks like the RPM packages are compiled with -O2, so this is
> of limited use.  So I'll be back shortly with more...

#3  0x00000000006874f1 in parallel_vacuum_process_all_indexes (pvs=0x25bdce0, num_index_scans=0,
vacuum=vacuum@entry=false)at vacuumparallel.c:611
 
611                     Assert(indstats->status == PARALLEL_INDVAC_STATUS_INITIAL);

(gdb) p *pvs
$1 = {pcxt = 0x25bc1e0, indrels = 0x25bbf70, nindexes = 8, shared = 0x7fc5184393a0, indstats = 0x7fc5184393e0,
dead_items= 0x7fc5144393a0, buffer_usage = 0x7fc514439280, wal_usage = 0x7fc514439240, 
 
  will_parallel_vacuum = 0x266d818, nindexes_parallel_bulkdel = 5, nindexes_parallel_cleanup = 0,
nindexes_parallel_condcleanup= 5, bstrategy = 0x264f120, relnamespace = 0x0, relname = 0x0, indname = 0x0, 
 
  status = PARALLEL_INDVAC_STATUS_INITIAL}

(gdb) p *indstats
$2 = {status = 11, parallel_workers_can_process = false, istat_updated = false, istat = {num_pages = 0, estimated_count
=false, num_index_tuples = 0, tuples_removed = 0, pages_newly_deleted = 0, pages_deleted = 1, 
 
    pages_free = 0}}

(gdb) bt f
...
#3  0x00000000006874f1 in parallel_vacuum_process_all_indexes (pvs=0x25bdce0, num_index_scans=0,
vacuum=vacuum@entry=false)at vacuumparallel.c:611
 
        indstats = 0x7fc5184393e0
        i = 0
        nworkers = 2
        new_status = PARALLEL_INDVAC_STATUS_NEED_CLEANUP
        __func__ = "parallel_vacuum_process_all_indexes"
#4  0x0000000000687ef0 in parallel_vacuum_cleanup_all_indexes (pvs=<optimized out>,
num_table_tuples=num_table_tuples@entry=409149,num_index_scans=<optimized out>,
estimated_count=estimated_count@entry=true)
    at vacuumparallel.c:486
No locals.
#5  0x00000000004f80b8 in lazy_cleanup_all_indexes (vacrel=vacrel@entry=0x25bc510) at vacuumlazy.c:2679
        reltuples = 409149
        estimated_count = true
#6  0x00000000004f884a in lazy_scan_heap (vacrel=vacrel@entry=0x25bc510) at vacuumlazy.c:1278
        rel_pages = 67334
        blkno = 67334
        next_unskippable_block = 67334
        next_failsafe_block = 0
        next_fsm_block_to_vacuum = 0
        dead_items = 0x7fc5144393a0
        vmbuffer = 1300
        next_unskippable_allvis = true
        skipping_current_range = false
        initprog_index = {0, 1, 5}
        initprog_val = {1, 67334, 11184809}
        __func__ = "lazy_scan_heap"
#7  0x00000000004f925f in heap_vacuum_rel (rel=0x7fc52df6b820, params=0x7ffd74f74620, bstrategy=0x264f120) at
vacuumlazy.c:534
        vacrel = 0x25bc510
        verbose = true
        instrument = <optimized out>
        aggressive = false
        skipwithvm = true
        frozenxid_updated = false
        minmulti_updated = false
        OldestXmin = 32759288
        FreezeLimit = 4277726584
        OldestMxact = 157411
        MultiXactCutoff = 4290124707
        orig_rel_pages = 67334
        new_rel_pages = <optimized out>
        new_rel_allvisible = 4
        ru0 = {tv = {tv_sec = 1660830451, tv_usec = 473980}, ru = {ru_utime = {tv_sec = 0, tv_usec = 317891}, ru_stime
={tv_sec = 1, tv_usec = 212372}, {ru_maxrss = 74524, __ru_maxrss_word = 74524}, {ru_ixrss = 0, 
 
              __ru_ixrss_word = 0}, {ru_idrss = 0, __ru_idrss_word = 0}, {ru_isrss = 0, __ru_isrss_word = 0},
{ru_minflt= 18870, __ru_minflt_word = 18870}, {ru_majflt = 0, __ru_majflt_word = 0}, {ru_nswap = 0, 
 
              __ru_nswap_word = 0}, {ru_inblock = 1124750, __ru_inblock_word = 1124750}, {ru_oublock = 0,
__ru_oublock_word= 0}, {ru_msgsnd = 0, __ru_msgsnd_word = 0}, {ru_msgrcv = 0, __ru_msgrcv_word = 0}, {ru_nsignals = 0,

              __ru_nsignals_word = 0}, {ru_nvcsw = 42, __ru_nvcsw_word = 42}, {ru_nivcsw = 35, __ru_nivcsw_word =
35}}}
        starttime = 714145651473980
        startreadtime = 0
        startwritetime = 0
        startwalusage = {wal_records = 2, wal_fpi = 0, wal_bytes = 421}
        StartPageHit = 50
        StartPageMiss = 0
        StartPageDirty = 0
        errcallback = {previous = 0x0, callback = 0x4f5f41 <vacuum_error_callback>, arg = 0x25bc510}
        indnames = 0x266d838
        __func__ = "heap_vacuum_rel"

This is a qemu VM which (full disclosure) has crashed a few times recently due
to OOM.  This is probably a postgres bug, but conceivably it's being tickled by
bad data (although the vm crashing shouldn't cause that, either, following
recovery).  This is also an instance that was pg_upgraded from v14 (and earlier
versions) to v15b1 and then b2, so it's conceivably possible there's weird data
pages that wouldn't be written by beta3.  But that doesn't seem to be the issue
here anyway.

-- 
Justin



pgsql-hackers by date:

Previous
From: Bruce Momjian
Date:
Subject: Re: Data caching
Next
From: Masahiko Sawada
Date:
Subject: Re: pg15b3: crash in paralell vacuum