Re: Vacuum: allow usage of more than 1GB of work mem - Mailing list pgsql-hackers

From Masahiko Sawada
Subject Re: Vacuum: allow usage of more than 1GB of work mem
Date
Msg-id CAD21AoDV00Lpafaezg+4do8BeF9zA_U9Lr_UOcHtPYH1EEi6uw@mail.gmail.com
Whole thread Raw
In response to Re: Vacuum: allow usage of more than 1GB of work mem  (Pavan Deolasee <pavan.deolasee@gmail.com>)
Responses Re: Vacuum: allow usage of more than 1GB of work mem  (Robert Haas <robertmhaas@gmail.com>)
List pgsql-hackers
On Fri, Sep 9, 2016 at 12:33 PM, Pavan Deolasee
<pavan.deolasee@gmail.com> wrote:
>
>
> On Thu, Sep 8, 2016 at 11:40 PM, Masahiko Sawada <sawada.mshk@gmail.com>
> wrote:
>>
>>
>>
>> Making the vacuum possible to choose between two data representations
>> sounds good.
>> I implemented the patch that changes dead tuple representation to bitmap
>> before.
>> I will measure the performance of bitmap representation again and post
>> them.
>
>
> Sounds great! I haven't seen your patch, but what I would suggest is to
> compute page density (D) = relpages/(dead+live tuples) and experiment with
> bitmap of sizes of D to 2D bits per page. May I also suggest that instead of
> putting in efforts in implementing the overflow area,  just count how many
> dead TIDs would fall under overflow area for a given choice of bitmap size.
>

Isn't that formula "page density (D) = (dead+live tuples)/relpages"?

> It might be a good idea to experiment with different vacuum scale factor,
> varying between 2% to 20% (may be 2, 5, 10, 20). You can probably run a
> longish pgbench test on a large table and then save the data directory for
> repeated experiments, although I'm not sure if pgbench will be a good choice
> because HOT will prevent accumulation of dead pointers, in which case you
> may try adding another index on abalance column.

Thank you, I will experiment with this.

>
> It'll be worth measuring memory consumption of both representations as well
> as performance implications on index vacuum. I don't expect to see any major
> difference in either heap scans.
>

Yeah, it would be effective for the index vacuum speed and the number
of execution of index vacuum.

Attached PoC patch changes the representation of dead tuple locations
to the hashmap having tuple bitmap.
The one hashmap entry consists of the block number and the TID bitmap
of corresponding block, and the block number is the hash key of
hashmap.
Current implementation of this patch is not smart yet because each
hashmap entry allocates the tuple bitmap with fixed
size(LAZY_ALLOC_TUPLES), so each hashentry can store up to
LAZY_ALLOC_TUPLES(291 if block size is 8kB) tuples.
In case where one block can store only the several tens tuples, the
most bits are would be waste.

After improved this patch as you suggested, I will measure performance benefit.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

Attachment

pgsql-hackers by date:

Previous
From: Kuntal Ghosh
Date:
Subject: Re: WAL consistency check facility
Next
From: Simon Riggs
Date:
Subject: Re: COPY command with RLS bug