Re: Heavily modified big table bloat even in auto vacuum is running - Mailing list pgsql-hackers

From Haribabu kommi
Subject Re: Heavily modified big table bloat even in auto vacuum is running
Date
Msg-id 8977CB36860C5843884E0A18D8747B0372BEFFE0@szxeml558-mbs.china.huawei.com
Whole thread Raw
In response to Re: Heavily modified big table bloat even in auto vacuum is running  (Amit Kapila <amit.kapila16@gmail.com>)
Responses Re: Heavily modified big table bloat even in auto vacuum is running  (Amit Kapila <amit.kapila16@gmail.com>)
List pgsql-hackers
On 25 November 2013 10:43 Amit Kapila wrote:
> On Fri, Nov 22, 2013 at 12:12 PM, Haribabu kommi
> <haribabu.kommi@huawei.com> wrote:
> > On 19 November 2013 10:33 Amit Kapila wrote:
> >> If I understood correctly, then your patch's main intention is to
> >> correct the estimate of dead tuples, so that it can lead to Vacuum
> >> cleaning the table/index which otherwise is not happening as per
> >> configuration value (autovacuum_vacuum_threshold) in some of the
> >> cases, also it is not reducing the complete bloat (Unpatched -
> 1532MB
> >> ~Patched   - 1474MB), as the main reason of bloat is extra space in
> >> index which can be reclaimed by reindex operation.
> >>
> >> So if above is correct then this patch has 3 advantages:
> >> a. Extra Vacuum on table/index due to better estimation of dead
> tuples.
> >> b. Space reclaim due to this extra vacuum c. may be some performance
> >> advantage as it will avoid the delay in cleaning dead tuples
> >>
> >> I think better way to test the patch is to see how much benefit is
> >> there due to above (a and b points) advantages. Different values of
> >> autovacuum_vacuum_threshold can be used to test.
> >
> >
> > The performance effect of the patch is not much visible as I think
> the
> > analyze on the table estimates the number of dead tuples of the table
> with some estimation.
>
>    Yes, that seems to be the reason why you are not seeing any
> performance benefit, but still I think this is useful optimization to
> do, as
>    analyze updates both the livetuples and dead tuples and similarly
> vacuum should also update both the counts. Do you see any reason
>    why Vacuum should only update live tuples and not deadtuples?

As vacuum touches all the pages where the dead tuples are present. This is not the
Same with analyzer. Because of this reason, the analyzer estimates the dead tuples also.
With the proposed patch the vacuum also estimates the dead tuples.

> > Because of this reason not much performance improvement is not
> visible
> > as the missed dead tuple calculation in vacuum is covered by the
> analyze.
>
>    Yeah, so might be we can check once by configuring
> analyze_threshold/scalefactor in a way that analyze doesn't get trigger
> during your test.

I ran the test for one hour with a high analyze_threshold and results are below.

        Auto vacuum count        Bloat size
Master        15            155MB
Patched        23            134MB

Updated test script and configuration is attached in the mail.

Regards,
Hari babu.

Attachment

pgsql-hackers by date:

Previous
From: Marko Tiikkaja
Date:
Subject: Re: psql shows line number
Next
From: Sawada Masahiko
Date:
Subject: Re: Logging WAL when updating hintbit