PoC: VALGRIND_MAKE_MEM_NOACCESS for dynamic shared memory - Mailing list pgsql-hackers

From Tomas Vondra
Subject PoC: VALGRIND_MAKE_MEM_NOACCESS for dynamic shared memory
Date
Msg-id d552eb6b-b1f2-49ab-8aef-ce10eff31d26@vondra.me
Whole thread
Responses Re: PoC: VALGRIND_MAKE_MEM_NOACCESS for dynamic shared memory
List pgsql-hackers
Hi,

A couple of days ago we got a report regarding an incorrect shmem size
calculation in btree [1], leading to a buffer overflow / memory
corruption. Which became a much less common issue in our code, thanks to
valgrind and similar tools. But it took me a while to realize valgrind
won't catch this because we only instrument private memory (palloc et
al), while shmem is left alone.

I was wondering if it's feasible to improve this. Attached is a trivial
patch that adjusts shm_toc.c to add a couple NOACCESS bytes after each
entry in the segment. It seems to do the trick - with this we get a
reasonable report (for the reproducer provided in the bug report, before
it got fixed by 748d871b7c) from valgrind, with invalid accesses. See
the attached .log for an example. It's much better than the confusing
crashes due to corrupted state.

There's an issue, though. It seems the valgrind memory markings are not
shared between processes. The leader sets the shm_toc up, marks the
ranges as NOACCESS, and then checks it while accessing the memory. But
the parallel workers don't seem to see this, and so will produce no
reports. I'm assuming this is the case, because all the reports come
from the leader, never from the workers. Maybe there's a different
explanation (e.g. maybe it's just the leader touching the memory?).

Still, it seems useful even with this limitation. If the leader
participates, it will produce a nice report (unless the worker crashes
first, because of the corrupted state). But we also have

    debug_parallel_query = regress

in which case everything should run in a single process, and work just
fine. Maybe that's good enough.

An alternative would be to do mprotect(), but unfortunately it requires
page-aligned ranges, which makes it somewhat useless for small overflows
of a couple bytes (like here). It should work cross-process, I think,

FWIW the PoC patch adds a 32-byte chunk, not just a single byte. This is
intentional, because if the state is an array, it's quite possible the
invalid access steps over a fair number of bytes. I'm actually thinking
it should be even larger.

This modifies just the dynamic shmem, but maybe we should do this even
for the "regular" shmem allocated at start. Issues in that would likely
cause crashes pretty quick (unlike the btree issue, which requires a
somewhat special reproducer), but a nice valgrind report helps.

regards


[1]
https://www.postgresql.org/message-id/CAGCUe0Lwk3C0qdkBa%2BOLpYc7yXwW%3Dpbaz8Sju4xMXEQAmyp%2B5g%40mail.gmail.com

-- 
Tomas Vondra

Attachment

pgsql-hackers by date:

Previous
From: Shinya Kato
Date:
Subject: Call EndCopyFrom() after initial table sync in logical replication
Next
From: Andres Freund
Date:
Subject: Re: Make printtup a bit faster