Re: Vacuum: allow usage of more than 1GB of work mem - Mailing list pgsql-hackers

From Masahiko Sawada
Subject Re: Vacuum: allow usage of more than 1GB of work mem
Date
Msg-id CAD21AoCnT3pZk=nO+L4mFxJZfAgfC4wRWZ=X9qrKoEkvr0FAZw@mail.gmail.com
Whole thread Raw
In response to Re: Vacuum: allow usage of more than 1GB of work mem  (Robert Haas <robertmhaas@gmail.com>)
Responses Re: Vacuum: allow usage of more than 1GB of work mem  (Jim Nasby <Jim.Nasby@BlueTreble.com>)
List pgsql-hackers
On Wed, Sep 7, 2016 at 2:39 AM, Robert Haas <robertmhaas@gmail.com> wrote:
> On Tue, Sep 6, 2016 at 10:28 PM, Claudio Freire <klaussfreire@gmail.com> wrote:
>>> The problem with this is that we allocate the entire amount of
>>> maintenance_work_mem even when the number of actual dead tuples turns
>>> out to be very small.  That's not so bad if the amount of memory we're
>>> potentially wasting is limited to ~1 GB, but it seems pretty dangerous
>>> to remove the 1 GB limit, because somebody might have
>>> maintenance_work_mem set to tens or hundreds of gigabytes to speed
>>> index creation, and allocating that much space for a VACUUM that
>>> encounters 1 dead tuple does not seem like a good plan.
>>>
>>> What I think we need to do is make some provision to initially
>>> allocate only a small amount of memory and then grow the allocation
>>> later if needed.  For example, instead of having
>>> vacrelstats->dead_tuples be declared as ItemPointer, declare it as
>>> ItemPointer * and allocate the array progressively in segments.  I'd
>>> actually argue that the segment size should be substantially smaller
>>> than 1 GB, like say 64MB; there are still some people running systems
>>> which are small enough that allocating 1 GB when we may need only 6
>>> bytes can drive the system into OOM.
>>
>> This would however incur the cost of having to copy the whole GB-sized
>> chunk every time it's expanded. It woudln't be cheap.
>
> No, I don't want to end up copying the whole array; that's what I
> meant by allocating it progressively in segments.  Something like what
> you go on to propose.
>
>> I've monitored the vacuum as it runs and the OS doesn't map the whole
>> block unless it's touched, which it isn't until dead tuples are found.
>> Surely, if overcommit is disabled (as it should), it could exhaust the
>> virtual address space if set very high, but it wouldn't really use the
>> memory unless it's needed, it would merely reserve it.
>
> Yeah, but I've seen actual breakage from exactly this issue on
> customer systems even with the 1GB limit, and when we start allowing
> 100GB it's going to get a whole lot worse.
>
>> To fix that, rather than repalloc the whole thing, dead_tuples would
>> have to be an ItemPointer** of sorted chunks. That'd be a
>> significantly more complex patch, but at least it wouldn't incur the
>> memcpy.
>
> Right, this is what I had in mind.  I don't think this is actually
> very complicated, because the way we use this array is really simple.
> We basically just keep appending to the array until we run out of
> space, and that's not very hard to implement with an array-of-arrays.
> The chunks are, in some sense, sorted, as you say, but you don't need
> to do qsort() or anything like that.  You're just replacing a single
> flat array with a data structure that can be grown incrementally in
> fixed-size chunks.
>

If we replaced dead_tuples with an array-of-array, isn't there
negative performance impact for lazy_tid_reap()?
As chunk is added, that performance would be decrease.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center



pgsql-hackers by date:

Previous
From: Andrew Borodin
Date:
Subject: Re: GiST penalty functions [PoC]
Next
From: Craig Ringer
Date:
Subject: Re: autonomous transactions