Re: pgsql: Avoid valgrind complaint about write() of uninitalized bytes. - Mailing list pgsql-hackers

From Peter Geoghegan
Subject Re: pgsql: Avoid valgrind complaint about write() of uninitalized bytes.
Date
Msg-id CAH2-Wz=SRX3uPYghPoP4oQwq8zwyBxCuA+zEpAk2PoCR_eOaMg@mail.gmail.com
Whole thread Raw
In response to Re: pgsql: Avoid valgrind complaint about write() of uninitalizedbytes.  (Andres Freund <andres@anarazel.de>)
Responses Re: pgsql: Avoid valgrind complaint about write() of uninitalized bytes.
Re: pgsql: Avoid valgrind complaint about write() of uninitalized bytes.
List pgsql-hackers
On Tue, Feb 20, 2018 at 9:34 PM, Andres Freund <andres@anarazel.de> wrote:
> Doesn't appear to have fixed the problem entirely:
> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=skink&dt=2018-02-20%2017%3A10%3A01
>
> relevant excerpt:
> ==12452== Syscall param write(buf) points to uninitialised byte(s)
> ==12452==    at 0x4E49C64: write (write.c:26)
> ==12452==    by 0x4BF8BF: FileWrite (fd.c:2017)
> ==12452==    by 0x4C1B69: BufFileDumpBuffer (buffile.c:513)
> ==12452==    by 0x4C1C61: BufFileFlush (buffile.c:657)
> ==12452==    by 0x4C21D6: BufFileRead (buffile.c:561)
> ==12452==    by 0x63ADA8: ltsReadBlock (logtape.c:274)
> ==12452==    by 0x63AEF6: ltsReadFillBuffer (logtape.c:304)
> ==12452==    by 0x63B560: LogicalTapeRewindForRead (logtape.c:771)
> ==12452==    by 0x640567: mergeruns (tuplesort.c:2671)
> ==12452==    by 0x645122: tuplesort_performsort (tuplesort.c:1866)
> ==12452==    by 0x23357D: _bt_parallel_scan_and_sort (nbtsort.c:1626)
> ==12452==    by 0x234F71: _bt_parallel_build_main (nbtsort.c:1527)

> Note that the above path doesn't appear to go through
> LogicalTapeFreeze(), therfore not hitting the VALGRIND_MAKE_MEM_DEFINED
> added in the above commit.

Sure, but it looks like it has the exact same underlying cause to the
LogicalTapeFreeze() issue. It shouldn't be very hard to write an
equivalent patch for LogicalTapeRewindForRead() -- I pointed out that
this could happen there instead before the first patch went in, in
fact. My mistake was to imagine that that could never happen during
the regression tests.

I think I see why this happened, and why it happens inconsistently on
skink. Because memtuples is generally much smaller during merging than
it is during initial run generation, you can have enough tuples to
have to spill a run to disk, and yet not enough to fully fill even a
small buffer during the subsequent merge (logtape.c buffers also don't
have per-tuple palloc() header overhead, which is a huge proportion of
total memory used for an int4 primary key CREATE INDEX). This failure
is sensitive to the scheduling of worker processes, which is why we
only see it occasionally on skink.

The multiple-row-versions isolation test, which is where we see the
failure, seems to want to use multiple parallel workers for its ALTER
TABLE ... ADD PRIMARY KEY, in part because the table fillfactor is
rather low -- you end up with a case that gets a parallel sort, and
yet still has very few tuples to sort, a bit like the
LogicalTapeFreeze() issue already fixed.

Should we even be doing a parallel external sort here? It's hardly
part of an intentional effort to test the code. I had to push to get
us to give external sorts test coverage at one point about 18 months
ago, because of concerns about the overhead/duration of external
sorts. Not that I feel strongly about this myself.

-- 
Peter Geoghegan


pgsql-hackers by date:

Previous
From: Andres Freund
Date:
Subject: Re: Add PGDLLIMPORT to enable_hashagg
Next
From: "Rady, Doug"
Date:
Subject: Re: [PATCH] pgbench - refactor some connection finish/null intocommon function