Bug in 9.6 tuplesort batch memory growth logic - Mailing list pgsql-hackers

From Peter Geoghegan
Subject Bug in 9.6 tuplesort batch memory growth logic
Date
Msg-id CAM3SWZRVkuUB68DbAkgw=532gW0f+fofKueAMsY7hVYi68MuYQ@mail.gmail.com
Whole thread Raw
Responses Re: Bug in 9.6 tuplesort batch memory growth logic
List pgsql-hackers
While working on my parallel CREATE INDEX patch, I came across a
problem. I initially assumed that what I saw was just a bug in my
development branch. During a final merge in a parallel worker, with
very little maintenance_work_mem, workers attempted to allocate an
amount of memory slightly less than 2 ^ 64, and fail, since that
exceeds MaxAllocHugeSize. I noticed that this happened immediately
following batchmemtuples() leaving us in a LACKMEM() state due to a
fault in its calculation (or, if you prefer, the lack of any defense
against that fault).

Further analysis led me to believe that this is a 9.6 bug. We should
have a check for sane memtuples growth, in line with the
grow_memtuples() check, where "We need to be sure that we do not cause
LACKMEM to become true, else the space management algorithm will go
nuts".  Because of the way grow_memtuples() is called by
batchmemtuples() in practice (in particular, because
availMemLessRefund may be < 0 in these corner cases),
grow_memtuples()'s own protections may not save us as previously
assumed. FREEMEM() *takes away* memory from the availMem budget when
"availMemLessRefund < 0", just after the conventional grow_memtuples()
checks run.

Attached patch adds a defense. FWIW, there is no reason to think that
more careful accounting could fix this problem, since in general a
LACKMEM() condition may not immediately lead to tuples being dumped
out.

I haven't been able to reproduce this on 9.6, but that seems
unnecessary. I can reproduce it with my parallel CREATE INDEX patch
applied, with just the right test case and right number of workers
(it's rather delicate). After careful consideration, I can think of no
reason why 9.6 would be unaffected.

--
Peter Geoghegan

Attachment

pgsql-hackers by date:

Previous
From: Steve Singer
Date:
Subject: Re: Logical Replication WIP
Next
From: Tom Lane
Date:
Subject: Re: Showing parallel status in \df+