Re: [PoC] Improve dead tuple storage for lazy vacuum - Mailing list pgsql-hackers

From Andres Freund
Subject Re: [PoC] Improve dead tuple storage for lazy vacuum
Date
Msg-id 20240407190731.izm3mdazednrsiqk@awork3.anarazel.de
Whole thread Raw
In response to Re: [PoC] Improve dead tuple storage for lazy vacuum  (Masahiko Sawada <sawada.mshk@gmail.com>)
Responses Re: [PoC] Improve dead tuple storage for lazy vacuum  (John Naylor <johncnaylorls@gmail.com>)
List pgsql-hackers
Hi,

On 2024-04-01 11:53:28 +0900, Masahiko Sawada wrote:
> On Fri, Mar 29, 2024 at 4:21 PM John Naylor <johncnaylorls@gmail.com> wrote:
> > I've marked it Ready for Committer.
>
> Thank you! I've attached the patch that I'm going to push tomorrow.

Locally I ran a 32bit build with ubsan enabled (by accident actually), which
complains:

performing post-bootstrap initialization ...
----------------------------------- stderr -----------------------------------
../../../../../home/andres/src/postgresql/src/backend/access/common/tidstore.c:341:24: runtime error: member access
withinmisaligned address 0xffb6258e for type 'struct BlocktableEntry', which requires 4 byte alignment
 
0xffb6258e: note: pointer points here
 00 00 02 00 01 40  dc e9 83 0b 80 48 70 ee  00 00 00 00 00 00 00 01  17 00 00 00 f8 d4 a6 ee  e8 25
             ^
    #0 0x814097e in TidStoreSetBlockOffsets
../../../../../home/andres/src/postgresql/src/backend/access/common/tidstore.c:341
    #1 0x826560a in dead_items_add ../../../../../home/andres/src/postgresql/src/backend/access/heap/vacuumlazy.c:2889
    #2 0x825f8da in lazy_scan_prune
../../../../../home/andres/src/postgresql/src/backend/access/heap/vacuumlazy.c:1502
    #3 0x825da71 in lazy_scan_heap ../../../../../home/andres/src/postgresql/src/backend/access/heap/vacuumlazy.c:977
    #4 0x825ad8f in heap_vacuum_rel ../../../../../home/andres/src/postgresql/src/backend/access/heap/vacuumlazy.c:499
    #5 0x8697e97 in table_relation_vacuum ../../../../../home/andres/src/postgresql/src/include/access/tableam.h:1725
    #6 0x869fca6 in vacuum_rel ../../../../../home/andres/src/postgresql/src/backend/commands/vacuum.c:2206
    #7 0x869a0fd in vacuum ../../../../../home/andres/src/postgresql/src/backend/commands/vacuum.c:622
    #8 0x869986b in ExecVacuum ../../../../../home/andres/src/postgresql/src/backend/commands/vacuum.c:449
    #9 0x8e5f832 in standard_ProcessUtility ../../../../../home/andres/src/postgresql/src/backend/tcop/utility.c:859
    #10 0x8e5e5f6 in ProcessUtility ../../../../../home/andres/src/postgresql/src/backend/tcop/utility.c:523
    #11 0x8e5b71a in PortalRunUtility ../../../../../home/andres/src/postgresql/src/backend/tcop/pquery.c:1158
    #12 0x8e5be80 in PortalRunMulti ../../../../../home/andres/src/postgresql/src/backend/tcop/pquery.c:1315
    #13 0x8e59f9b in PortalRun ../../../../../home/andres/src/postgresql/src/backend/tcop/pquery.c:791
    #14 0x8e4d5f3 in exec_simple_query ../../../../../home/andres/src/postgresql/src/backend/tcop/postgres.c:1274
    #15 0x8e55159 in PostgresMain ../../../../../home/andres/src/postgresql/src/backend/tcop/postgres.c:4680
    #16 0x8e54445 in PostgresSingleUserMain ../../../../../home/andres/src/postgresql/src/backend/tcop/postgres.c:4136
    #17 0x88bb55e in main ../../../../../home/andres/src/postgresql/src/backend/main/main.c:194
    #18 0xf76f47c4  (/lib/i386-linux-gnu/libc.so.6+0x237c4) (BuildId: fe79efe6681a919714a4e119da2baac3a4953fbf)
    #19 0xf76f4887 in __libc_start_main (/lib/i386-linux-gnu/libc.so.6+0x23887) (BuildId:
fe79efe6681a919714a4e119da2baac3a4953fbf)
    #20 0x80d40f7 in _start
(/srv/dev/build/postgres/m-dev-assert-32/tmp_install/srv/dev/install/postgres/m-dev-assert-32/bin/postgres+0x80d40f7)

SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior
../../../../../home/andres/src/postgresql/src/backend/access/common/tidstore.c:341:24in
 
Aborted (core dumped)
child process exited with exit code 134
initdb: data directory "/srv/dev/build/postgres/m-dev-assert-32/tmp_install/initdb-template" not removed at user's
request


At first I was confused why CI didn't find this. Turns out that, for me, this
is only triggered without compiler optimizations, and I had used -O0 while CI
uses some optimizations.

Backtrace:
#9  0x0814097f in TidStoreSetBlockOffsets (ts=0xb8dfde4, blkno=15, offsets=0xffb6275c, num_offsets=11)
    at ../../../../../home/andres/src/postgresql/src/backend/access/common/tidstore.c:341
#10 0x0826560b in dead_items_add (vacrel=0xb8df6d4, blkno=15, offsets=0xffb6275c, num_offsets=11)
    at ../../../../../home/andres/src/postgresql/src/backend/access/heap/vacuumlazy.c:2889
#11 0x0825f8db in lazy_scan_prune (vacrel=0xb8df6d4, buf=24, blkno=15, page=0xeeb6c000 "", vmbuffer=729,
all_visible_according_to_vm=false,
    has_lpdead_items=0xffb62a1f) at
../../../../../home/andres/src/postgresql/src/backend/access/heap/vacuumlazy.c:1502
#12 0x0825da72 in lazy_scan_heap (vacrel=0xb8df6d4) at
../../../../../home/andres/src/postgresql/src/backend/access/heap/vacuumlazy.c:977
#13 0x0825ad90 in heap_vacuum_rel (rel=0xb872810, params=0xffb62e90, bstrategy=0xb99d5e0)
    at ../../../../../home/andres/src/postgresql/src/backend/access/heap/vacuumlazy.c:499
#14 0x08697e98 in table_relation_vacuum (rel=0xb872810, params=0xffb62e90, bstrategy=0xb99d5e0)
    at ../../../../../home/andres/src/postgresql/src/include/access/tableam.h:1725
#15 0x0869fca7 in vacuum_rel (relid=1249, relation=0x0, params=0xffb62e90, bstrategy=0xb99d5e0)
    at ../../../../../home/andres/src/postgresql/src/backend/commands/vacuum.c:2206
#16 0x0869a0fe in vacuum (relations=0xb99de08, params=0xffb62e90, bstrategy=0xb99d5e0, vac_context=0xb99d550,
isTopLevel=true)

(gdb) p/x page
$1 = 0xffb6258e


I think compiler optimizations are only tangentially involved here, they
trigger the stack frame layout to change, e.g. because some variable will just
exist in a register.


Looking at the code, the failure isn't suprising anymore:
    char        data[MaxBlocktableEntrySize];
    BlocktableEntry *page = (BlocktableEntry *) data;

'char' doesn't enforce any alignment, but you're storing a BlocktableEntry in
a char[]. You can't just do that.  Look at how we do that for
e.g. PGAlignedblock.


With the attached minimal fix, the tests pass again.

Greetings,

Andres Freund

Attachment

pgsql-hackers by date:

Previous
From: Alexander Lakhin
Date:
Subject: Re: Add SPLIT PARTITION/MERGE PARTITIONS commands
Next
From: "Andrey M. Borodin"
Date:
Subject: Re: MultiXact\SLRU buffers configuration