Using Valgrind to detect faulty buffer accesses (no pin or buffercontent lock held) - Mailing list pgsql-hackers

From Peter Geoghegan
Subject Using Valgrind to detect faulty buffer accesses (no pin or buffercontent lock held)
Date
Msg-id CAH2-WzkLgyN3zBvRZ1pkNJThC=xi_0gpWRUb_45eexLH1+k2_Q@mail.gmail.com
Whole thread Raw
Responses Re: Using Valgrind to detect faulty buffer accesses (no pin or buffercontent lock held)
List pgsql-hackers
I recently expressed an interest in using Valgrind memcheck to detect
access to pages whose buffers do not have a pin held in the backend,
or do not have a buffer lock held (the latter check makes sense for
pages owned by index access methods). I came up with a quick and dirty
patch, that I confirmed found a bug in nbtree VACUUM that I spotted
randomly:

https://postgr.es/m/CAH2-Wz=WRu6NMWtit2weDnuGxdsWeNyFygeBP_zZ2Sso0YAGFg@mail.gmail.com

(This is a bug in commit 857f9c36cda.)

Alvaro wrote a similar patch back in 2015, that I'd forgotten about
but was reminded of today:

https://postgr.es/m/20150723195349.GW5596@postgresql.org

I took his version (which was better than my rough original) and
rebased it -- that's attached as the first patch. The second patch is
something that takes the general idea further by having nbtree mark
pages whose buffers lack a buffer lock (that may or may not have a
buffer pin) as NOACCESS in a similar way.

This second patch detected two more bugs in nbtree page deletion by
running the regression tests with Valgrind memcheck. These additional
bugs are probably of lower severity than the first one, since we at
least have a buffer pin (we just don't have buffer locks). All three
bugs are very similar, though: they all involve dereferencing a
pointer to the special area of a page at a point where the underlying
buffer is no longer safe to access.

The final two patches fix the two newly discovered bugs -- I don't
have a fix for the first bug yet, since that one is more complicated
(and probably more serious). The regression tests run with Valgrind
will complain about all three bugs if you just apply the first two
patches (though you only need the first patch to see a complaint about
the first, more serious bug when the tests are run).

-- 
Peter Geoghegan

Attachment

pgsql-hackers by date:

Previous
From: James Coleman
Date:
Subject: Re: Binary search in ScalarArrayOpExpr for OR'd constant arrays
Next
From: Pavel Stehule
Date:
Subject: Re: psql - pager support - using invisible chars for signalling endof report