Re: PoC: VALGRIND_MAKE_MEM_NOACCESS for dynamic shared memory - Mailing list pgsql-hackers
| From | Andres Freund |
|---|---|
| Subject | Re: PoC: VALGRIND_MAKE_MEM_NOACCESS for dynamic shared memory |
| Date | |
| Msg-id | wjizhlvtyrr7zee55g7lo6ob5yv5jmkk3igadcuwpvodt7yvv5@wgbdi6v7v5nv Whole thread |
| In response to | PoC: VALGRIND_MAKE_MEM_NOACCESS for dynamic shared memory (Tomas Vondra <tomas@vondra.me>) |
| Responses |
Re: PoC: VALGRIND_MAKE_MEM_NOACCESS for dynamic shared memory
|
| List | pgsql-hackers |
Hi,
On 2026-05-04 10:18:31 +0200, Tomas Vondra wrote:
> A couple of days ago we got a report regarding an incorrect shmem size
> calculation in btree [1], leading to a buffer overflow / memory
> corruption. Which became a much less common issue in our code, thanks to
> valgrind and similar tools. But it took me a while to realize valgrind
> won't catch this because we only instrument private memory (palloc et
> al), while shmem is left alone.
>
> I was wondering if it's feasible to improve this. Attached is a trivial
> patch that adjusts shm_toc.c to add a couple NOACCESS bytes after each
> entry in the segment. It seems to do the trick - with this we get a
> reasonable report (for the reproducer provided in the bug report, before
> it got fixed by 748d871b7c) from valgrind, with invalid accesses. See
> the attached .log for an example. It's much better than the confusing
> crashes due to corrupted state.
>
> There's an issue, though. It seems the valgrind memory markings are not
> shared between processes. The leader sets the shm_toc up, marks the
> ranges as NOACCESS, and then checks it while accessing the memory. But
> the parallel workers don't seem to see this, and so will produce no
> reports. I'm assuming this is the case, because all the reports come
> from the leader, never from the workers. Maybe there's a different
> explanation (e.g. maybe it's just the leader touching the memory?).
I assume the issue is just that the workers don't have the NOACCESS markers? I
think you'd need to do them in every process using the shm_toc. Either by
doing it in shm_toc_attach() or in shm_toc().
> An alternative would be to do mprotect(), but unfortunately it requires
> page-aligned ranges, which makes it somewhat useless for small overflows
> of a couple bytes (like here). It should work cross-process, I think,
It doesn't work across processes. Every process has their own mprotect "view".
> FWIW the PoC patch adds a 32-byte chunk, not just a single byte. This is
> intentional, because if the state is an array, it's quite possible the
> invalid access steps over a fair number of bytes. I'm actually thinking
> it should be even larger.
>
> This modifies just the dynamic shmem, but maybe we should do this even
> for the "regular" shmem allocated at start. Issues in that would likely
> cause crashes pretty quick (unlike the btree issue, which requires a
> somewhat special reproducer), but a nice valgrind report helps.
I can tell you from experience that no, it's not necessarily quickly caught.
So yes, I think we should definitely do that.
> /*
> * Allocate shared memory from a segment managed by a table of contents.
> *
> @@ -119,9 +127,19 @@ shm_toc_allocate(shm_toc *toc, Size nbytes)
> }
> vtoc->toc_allocated_bytes += nbytes;
>
> +#ifdef USE_VALGRIND
> + vtoc->toc_allocated_bytes += NUM_NOACCESS_BYTES;
> +#endif
> +
> SpinLockRelease(&toc->toc_mutex);
>
> - return ((char *) toc) + (total_bytes - allocated_bytes - nbytes);
> +#ifdef USE_VALGRIND
> + /* make the bytes at the end no-access */
> + VALGRIND_MAKE_MEM_NOACCESS(((char *) toc) + (total_bytes - allocated_bytes - NUM_NOACCESS_BYTES),
> + NUM_NOACCESS_BYTES);
> +#endif
> +
> + return ((char *) toc) + (total_bytes - allocated_bytes - nbytes - NUM_NOACCESS_BYTES);
> }
The size is already rounded up by that point:
/*
* Make sure request is well-aligned. XXX: MAXALIGN is not enough,
* because atomic ops might need a wider alignment. We don't have a
* proper definition for the minimum to make atomic ops safe, but
* BUFFERALIGN ought to be enough.
*/
nbytes = BUFFERALIGN(nbytes);
Which means that you'll often have a up to 32byte pad at the end of the
(ALIGNOF_BUFFER=32) allocation already. I don't care about the waste, but the
ALIGNOF_BUFFER padding will often prevent detecting smaller out-of-bounds
accesses.
Greetings,
Andres Freund
pgsql-hackers by date: