Re: xid wraparound danger due to INDEX_CLEANUP false - Mailing list pgsql-hackers

From Peter Geoghegan
Subject Re: xid wraparound danger due to INDEX_CLEANUP false
Date
Msg-id CAH2-Wz=WRu6NMWtit2weDnuGxdsWeNyFygeBP_zZ2Sso0YAGFg@mail.gmail.com
Whole thread Raw
In response to Re: xid wraparound danger due to INDEX_CLEANUP false  (Peter Geoghegan <pg@bowt.ie>)
Responses Re: xid wraparound danger due to INDEX_CLEANUP false  (Andres Freund <andres@anarazel.de>)
List pgsql-hackers
On Wed, Apr 22, 2020 at 6:05 PM Peter Geoghegan <pg@bowt.ie> wrote:
> (It would be nice if we could teach Valgrind to "poison" buffers when
> we don't have a pin held...that would probably have caught this issue
> almost immediately.)

I can get Valgrind to complain about it when the regression tests are
run with the attached patch applied. I see this in the logs at several
points when "make installcheck" runs:

==1082059== VALGRINDERROR-BEGIN
==1082059== Invalid read of size 4
==1082059==    at 0x21D8DE: btvacuumpage (nbtree.c:1370)
==1082059==    by 0x21DA61: btvacuumscan (nbtree.c:1039)
==1082059==    by 0x21DBD5: btbulkdelete (nbtree.c:879)
==1082059==    by 0x215821: index_bulk_delete (indexam.c:698)
==1082059==    by 0x20FDCE: lazy_vacuum_index (vacuumlazy.c:2427)
==1082059==    by 0x2103EA: lazy_vacuum_all_indexes (vacuumlazy.c:1794)
==1082059==    by 0x211EA1: lazy_scan_heap (vacuumlazy.c:1681)
==1082059==    by 0x211EA1: heap_vacuum_rel (vacuumlazy.c:510)
==1082059==    by 0x360414: table_relation_vacuum (tableam.h:1457)
==1082059==    by 0x360414: vacuum_rel (vacuum.c:1880)
==1082059==    by 0x361785: vacuum (vacuum.c:449)
==1082059==    by 0x361F0E: ExecVacuum (vacuum.c:249)
==1082059==    by 0x4D979C: standard_ProcessUtility (utility.c:823)
==1082059==    by 0x4D9C7F: ProcessUtility (utility.c:522)
==1082059==    by 0x4D6791: PortalRunUtility (pquery.c:1157)
==1082059==    by 0x4D725F: PortalRunMulti (pquery.c:1303)
==1082059==    by 0x4D7CEF: PortalRun (pquery.c:779)
==1082059==    by 0x4D3BB7: exec_simple_query (postgres.c:1239)
==1082059==    by 0x4D4ABD: PostgresMain (postgres.c:4315)
==1082059==    by 0x45B0C9: BackendRun (postmaster.c:4510)
==1082059==    by 0x45B0C9: BackendStartup (postmaster.c:4202)
==1082059==    by 0x45B0C9: ServerLoop (postmaster.c:1727)
==1082059==    by 0x45C754: PostmasterMain (postmaster.c:1400)
==1082059==    by 0x3BDD68: main (main.c:210)
==1082059==  Address 0x6cc7378 is in a rw- anonymous segment
==1082059==
==1082059== VALGRINDERROR-END

(The line numbers might be slightly different to master here, but the
line from btvacuumpage() is definitely the one that accesses the
special area of the B-Tree page after we drop the pin.)

This patch is very rough -- it was just the first thing that I tried.
I don't know how Valgrind remembers the status of shared memory
regions across backends when they're marked with
VALGRIND_MAKE_MEM_NOACCESS(). Even still, this idea looks promising. I
should try to come up with a committable patch before too long.

The good news is that the error I showed is the only error that I see,
at least with this rough patch + "make installcheck". It's possible
that the patch isn't as effective as it could be, though. For one
thing, it definitely won't detect incorrect buffer accesses where a
pin is held but a buffer lock is not held. That seems possible, but a
bit harder.

-- 
Peter Geoghegan

Attachment

pgsql-hackers by date:

Previous
From: David Rowley
Date:
Subject: Re: create partition table caused server crashed withself-referencing foreign key
Next
From: Amit Kapila
Date:
Subject: Re: PG compilation error with Visual Studio 2015/2017/2019