Hi,
On Thu, Jun 20, 2024 at 7:54 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> I wrote:
> > I've reproduced what looks like about the same thing on
> > my Pi 4 using Fedora 38: just run "make installcheck-parallel"
> > under valgrind, and kaboom. Definitely needs investigation.
>
> The problem appears to be that RT_ALLOC_NODE doesn't bother to
> initialize the chunks[] array when making a RT_NODE_16 node.
> If we fill fewer than RT_FANOUT_16_MAX of the chunks[] entries,
> then when RT_NODE_16_SEARCH_EQ applies vector operations that
> read the entire array, it's operating partially on uninitialized
> data. Now, that's actually OK because of the "mask off invalid
> entries" step, but aarch64 valgrind complains anyway.
>
> I hypothesize that the reason we're not seeing equivalent failures
> on x86_64 is one of
>
> 1. x86_64 valgrind is stupider than aarch64, and fails to track that
> the contents of the SIMD registers are only partially defined.
>
> 2. x86_64 valgrind is smarter than aarch64, and is able to see
> that the "mask off invalid entries" step removes all the
> potentially-uninitialized bits.
>
> The first attached patch, "radixtree-fix-minimal.patch", is enough
> to stop the aarch64 valgrind failure for me. However, I think
> that the coding here is pretty penny-wise and pound-foolish,
> and that what we really ought to do is the second patch,
> "radixtree-fix-proposed.patch". I do not believe that asking
> memset to zero the three-byte RT_NODE structure produces code
> that's either shorter or faster than having it zero 8 bytes
> (as for RT_NODE_4) or having it do that and then separately
> zero some more stuff (as for the larger node types). Leaving
> RT_NODE_4's chunks[] array uninitialized is going to bite us
> someday, too, even if it doesn't right now. So I think we
> ought to just zero the whole fixed-size part of the nodes,
> which is what radixtree-fix-proposed.patch does.
I agree with radixtree-fix-proposed.patch. Even if we zero more fields
in the node it would not add noticeable overheads.
>
> (The RT_NODE_48 case could be optimized a bit if we cared
> to swap the order of its slot_idxs[] and isset[] arrays;
> then the initial zeroing could just go up to slot_idxs[].
> I don't know if there's any reason why the current order
> is preferable.)
IIUC there is no particular reason for the current order in RT_NODE_48.
Regards,
--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com