Home > mailing lists

Re: suspicious valgrind reports about radixtree/tidstore on arm64 - Mailing list pgsql-hackers

From	Masahiko Sawada
Subject	Re: suspicious valgrind reports about radixtree/tidstore on arm64
Date	June 20, 2024 01:11:37
Msg-id	CAD21AoAjnvXxgoqAnyO4sfxBBxTSzmPz6BLuwWsrZEnLAPgFaQ@mail.gmail.com Whole thread
In response to	Re: suspicious valgrind reports about radixtree/tidstore on arm64 (Tom Lane <tgl@sss.pgh.pa.us>)
Responses	Re: suspicious valgrind reports about radixtree/tidstore on arm64 Re: suspicious valgrind reports about radixtree/tidstore on arm64
List	pgsql-hackers

Tree view

Hi,

On Thu, Jun 20, 2024 at 7:54 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> I wrote:
> > I've reproduced what looks like about the same thing on
> > my Pi 4 using Fedora 38: just run "make installcheck-parallel"
> > under valgrind, and kaboom.  Definitely needs investigation.
>
> The problem appears to be that RT_ALLOC_NODE doesn't bother to
> initialize the chunks[] array when making a RT_NODE_16 node.
> If we fill fewer than RT_FANOUT_16_MAX of the chunks[] entries,
> then when RT_NODE_16_SEARCH_EQ applies vector operations that
> read the entire array, it's operating partially on uninitialized
> data.  Now, that's actually OK because of the "mask off invalid
> entries" step, but aarch64 valgrind complains anyway.
>
> I hypothesize that the reason we're not seeing equivalent failures
> on x86_64 is one of
>
> 1. x86_64 valgrind is stupider than aarch64, and fails to track that
> the contents of the SIMD registers are only partially defined.
>
> 2. x86_64 valgrind is smarter than aarch64, and is able to see
> that the "mask off invalid entries" step removes all the
> potentially-uninitialized bits.
>
> The first attached patch, "radixtree-fix-minimal.patch", is enough
> to stop the aarch64 valgrind failure for me.  However, I think
> that the coding here is pretty penny-wise and pound-foolish,
> and that what we really ought to do is the second patch,
> "radixtree-fix-proposed.patch".  I do not believe that asking
> memset to zero the three-byte RT_NODE structure produces code
> that's either shorter or faster than having it zero 8 bytes
> (as for RT_NODE_4) or having it do that and then separately
> zero some more stuff (as for the larger node types).  Leaving
> RT_NODE_4's chunks[] array uninitialized is going to bite us
> someday, too, even if it doesn't right now.  So I think we
> ought to just zero the whole fixed-size part of the nodes,
> which is what radixtree-fix-proposed.patch does.

I agree with radixtree-fix-proposed.patch. Even if we zero more fields
in the node it would not add noticeable overheads.

>
> (The RT_NODE_48 case could be optimized a bit if we cared
> to swap the order of its slot_idxs[] and isset[] arrays;
> then the initial zeroing could just go up to slot_idxs[].
> I don't know if there's any reason why the current order
> is preferable.)

IIUC there is no particular reason for the current order in RT_NODE_48.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

pgsql-hackers by date:

From: Peter Smith
Date: 20 June 2024, 01:05:35
Subject: Re: DOCS: Generated table columns are skipped by logical replication

From: Noah Misch
Date: 20 June 2024, 01:29:08
Subject: datfrozenxid > relfrozenxid w/ crash before XLOG_HEAP_INPLACE

Re: suspicious valgrind reports about radixtree/tidstore on arm64 - Mailing list pgsql-hackers

Previous

Next