Home > mailing lists

Re: Vacuum: allow usage of more than 1GB of work mem - Mailing list pgsql-hackers

From	Claudio Freire
Subject	Re: Vacuum: allow usage of more than 1GB of work mem
Date	September 13, 2016 21:59:35
Msg-id	CAGTBQpZ8u9EGjeySyHZ3xUmsHKQmOdDLeXHUQ-pD5fADAE+SAg@mail.gmail.com Whole thread Raw
In response to	Re: Vacuum: allow usage of more than 1GB of work mem (Robert Haas <robertmhaas@gmail.com>)
Responses	Re: Vacuum: allow usage of more than 1GB of work mem (Robert Haas <robertmhaas@gmail.com>)
List	pgsql-hackers

Tree view

On Tue, Sep 13, 2016 at 3:51 PM, Robert Haas <robertmhaas@gmail.com> wrote:
> On Fri, Sep 9, 2016 at 3:04 AM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>> Attached PoC patch changes the representation of dead tuple locations
>> to the hashmap having tuple bitmap.
>> The one hashmap entry consists of the block number and the TID bitmap
>> of corresponding block, and the block number is the hash key of
>> hashmap.
>> Current implementation of this patch is not smart yet because each
>> hashmap entry allocates the tuple bitmap with fixed
>> size(LAZY_ALLOC_TUPLES), so each hashentry can store up to
>> LAZY_ALLOC_TUPLES(291 if block size is 8kB) tuples.
>> In case where one block can store only the several tens tuples, the
>> most bits are would be waste.
>>
>> After improved this patch as you suggested, I will measure performance benefit.
>
> We also need to consider the amount of memory gets used.  What I
> proposed - replacing the array with an array of arrays - would not
> increase memory utilization significantly.  I don't think it would
> have much impact on CPU utilization either.

I've finished writing that patch, I'm in the process of testing its CPU impact.

First test seemed to hint at a 40% increase in CPU usage, which seems
rather steep compared to what I expected, so I'm trying to rule out
some methodology error here.

> It would require
> replacing the call to bsearch() in lazy_heap_reaptid() with an
> open-coded implementation of bsearch, or with one bsearch to find the
> chunk and another to find the TID within the chunk, but that shouldn't
> be very expensive.

I did a linear search to find the chunk, with exponentially growing
chunks, and then a bsearch to find the item inside the chunk.

With the typical number of segments and given the 12GB limit, the
segment array size is well within the range that favors linear search.

> For example, we could keep a bitmap with one bit per K
> pages.  If the bit is set, there is at least 1 dead tuple on that
> page; if clear, there are none.  When we see an index tuple, we
> consult the bitmap to determine whether we need to search the TID
> list.  We select K to be the smallest power of 2 such that the bitmap
> uses less memory than some threshold, perhaps 64kB.

I've been pondering something like that, but that's an optimization
that's quite orthogonal to the multiarray stuff.

>  Assuming that
> updates and deletes to the table have some locality, we should be able
> to skip a large percentage of the TID searches with a probe into this
> very compact bitmap.

I don't think you can assume locality

pgsql-hackers by date:

From: Jesper Pedersen
Date: 13 September 2016, 21:59:10
Subject: Re: Hash Indexes

From: Robert Haas
Date: 13 September 2016, 22:06:09
Subject: Re: Vacuum: allow usage of more than 1GB of work mem

Re: Vacuum: allow usage of more than 1GB of work mem - Mailing list pgsql-hackers

Previous

Next