Re: pgsql: Avoid valgrind complaint about write() of uninitalized bytes. - Mailing list pgsql-hackers
From | Peter Geoghegan |
---|---|
Subject | Re: pgsql: Avoid valgrind complaint about write() of uninitalized bytes. |
Date | |
Msg-id | CAH2-Wz=SRX3uPYghPoP4oQwq8zwyBxCuA+zEpAk2PoCR_eOaMg@mail.gmail.com Whole thread Raw |
In response to | Re: pgsql: Avoid valgrind complaint about write() of uninitalizedbytes. (Andres Freund <andres@anarazel.de>) |
Responses |
Re: pgsql: Avoid valgrind complaint about write() of uninitalized bytes.
Re: pgsql: Avoid valgrind complaint about write() of uninitalized bytes. |
List | pgsql-hackers |
On Tue, Feb 20, 2018 at 9:34 PM, Andres Freund <andres@anarazel.de> wrote: > Doesn't appear to have fixed the problem entirely: > https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=skink&dt=2018-02-20%2017%3A10%3A01 > > relevant excerpt: > ==12452== Syscall param write(buf) points to uninitialised byte(s) > ==12452== at 0x4E49C64: write (write.c:26) > ==12452== by 0x4BF8BF: FileWrite (fd.c:2017) > ==12452== by 0x4C1B69: BufFileDumpBuffer (buffile.c:513) > ==12452== by 0x4C1C61: BufFileFlush (buffile.c:657) > ==12452== by 0x4C21D6: BufFileRead (buffile.c:561) > ==12452== by 0x63ADA8: ltsReadBlock (logtape.c:274) > ==12452== by 0x63AEF6: ltsReadFillBuffer (logtape.c:304) > ==12452== by 0x63B560: LogicalTapeRewindForRead (logtape.c:771) > ==12452== by 0x640567: mergeruns (tuplesort.c:2671) > ==12452== by 0x645122: tuplesort_performsort (tuplesort.c:1866) > ==12452== by 0x23357D: _bt_parallel_scan_and_sort (nbtsort.c:1626) > ==12452== by 0x234F71: _bt_parallel_build_main (nbtsort.c:1527) > Note that the above path doesn't appear to go through > LogicalTapeFreeze(), therfore not hitting the VALGRIND_MAKE_MEM_DEFINED > added in the above commit. Sure, but it looks like it has the exact same underlying cause to the LogicalTapeFreeze() issue. It shouldn't be very hard to write an equivalent patch for LogicalTapeRewindForRead() -- I pointed out that this could happen there instead before the first patch went in, in fact. My mistake was to imagine that that could never happen during the regression tests. I think I see why this happened, and why it happens inconsistently on skink. Because memtuples is generally much smaller during merging than it is during initial run generation, you can have enough tuples to have to spill a run to disk, and yet not enough to fully fill even a small buffer during the subsequent merge (logtape.c buffers also don't have per-tuple palloc() header overhead, which is a huge proportion of total memory used for an int4 primary key CREATE INDEX). This failure is sensitive to the scheduling of worker processes, which is why we only see it occasionally on skink. The multiple-row-versions isolation test, which is where we see the failure, seems to want to use multiple parallel workers for its ALTER TABLE ... ADD PRIMARY KEY, in part because the table fillfactor is rather low -- you end up with a case that gets a parallel sort, and yet still has very few tuples to sort, a bit like the LogicalTapeFreeze() issue already fixed. Should we even be doing a parallel external sort here? It's hardly part of an intentional effort to test the code. I had to push to get us to give external sorts test coverage at one point about 18 months ago, because of concerns about the overhead/duration of external sorts. Not that I feel strongly about this myself. -- Peter Geoghegan
pgsql-hackers by date: