On Thu, 2017-06-29 at 10:00 -0700, Peter Geoghegan wrote:
> On Thu, Jun 29, 2017 at 9:16 AM, <skoposov@cmu.edu> wrote:
> > From a quick look of the code it looks to me that the reason for the bug is
> > the 32 bit int overflow in the j=2*i+1 calculation inside the
> > tuplesort_heap_siftup leading to negative values of j.
>
> It seems likely that the explanation is as simple as that. This
> happens during run generation with replacement selection. All versions
> are affected, but version 9.6+ is dramatically less likely to be
> affected, because replacement selection was all but killed in Postgres
> 9.6.
>
> This is an oversight in commit 263865a. The fix is to use a variable
> that won't overflow in tuplesort_heap_siftup() -- this is probably a
> one-liner, because when the variable overflows today, the correct
> behavior would be for control to break out of the loop that declares
> the overflowing variable "j", and, I don't see any similar problem in
> other heap maintenance routines. It's a very isolated problem.
>
> I could write a patch.
Just to avoid being forgotten, I attach a trivial patch against 9.5
branch as well as have created a commitfest submission
https://commitfest.postgresql.org/14/1189/
The script below allows to reproduce the bug (segfault) and test that
the patch fixes it: (>~70 GB of RAM are needed and 100+GB of disk
space)
---
create table xx3 as
select generate_series as a
from generate_series(0,(1.5*((1::bigint)<<31))::bigint);
set maintenance_work_mem to '70GB';
create index on xx3(a);
----
Hopefully somebody can take care of patching other PG branches.
Sergey
--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs