Re: BUG #17619: AllocSizeIsValid violation in parallel hash join - Mailing list pgsql-bugs

From Thomas Munro
Subject Re: BUG #17619: AllocSizeIsValid violation in parallel hash join
Date
Msg-id CA+hUKGKu3xSP7JsRGHw0d2Lxe_e4Y-3bf_1dkgAZj7xcsG=q1w@mail.gmail.com
Whole thread Raw
In response to BUG #17619: AllocSizeIsValid violation in parallel hash join  (PG Bug reporting form <noreply@postgresql.org>)
Responses Re: BUG #17619: AllocSizeIsValid violation in parallel hash join
List pgsql-bugs
On Thu, Sep 22, 2022 at 7:46 PM PG Bug reporting form
<noreply@postgresql.org> wrote:
> (gdb) p size
> $2 = 1702125924

Thanks for the detailed report.  Hmm.  That size, on a little-endian
system, is equivalent to the byte sequence "date\0\0\0\0", which looks
pretty suspiciously like the inside of a tuple, and not its size.  We
must have got out of sync somehow.

> Potentially interesting piece of the puzzle is that there are some long
> outliers in rhs.payload and rhs.source, but the rest of the columns have
> values that are exactly of avg_width bytes:
>
> # select log_len, count(*) from (select log(length(payload))::int as log_len
> from rhs) foo group by 1 order by 2 desc;
>  log_len │ count
> ─────────┼────────
>        3 │ 840852
>        4 │  77776
>        5 │   8003
>        6 │   1317
>        7 │     20
> (5 rows)

So there are some strings up to order 10^7 in length in there.  The
file format consists of chunks, with a special case for tuples that
don't fit in one chunk.  Perhaps there is a bug in that logic.  It is
exercised in our regression tests, but perhaps not enough.  I'll try
to repro this from your clues.



pgsql-bugs by date:

Previous
From: qtds_126
Date:
Subject: Re: The keyword in the procedure's error message is "function", which should be "procedure"
Next
From: Dmitry Astapov
Date:
Subject: Re: BUG #17619: AllocSizeIsValid violation in parallel hash join