Re: Heavily modified big table bloat even in auto vacuum is running - Mailing list pgsql-hackers

From Haribabu kommi
Subject Re: Heavily modified big table bloat even in auto vacuum is running
Date
Msg-id 8977CB36860C5843884E0A18D8747B0372BF0B5B@szxeml558-mbs.china.huawei.com
Whole thread Raw
In response to Re: Heavily modified big table bloat even in auto vacuum is running  (Amit Kapila <amit.kapila16@gmail.com>)
Responses Re: Heavily modified big table bloat even in auto vacuum is running  (Amit Kapila <amit.kapila16@gmail.com>)
List pgsql-hackers
On 29 November 2013 12:00 Amit Kapila wrote:
> On Tue, Nov 26, 2013 at 7:26 PM, Haribabu kommi
> <haribabu.kommi@huawei.com> wrote:
> > On 25 November 2013 10:43 Amit Kapila wrote:
> >> On Fri, Nov 22, 2013 at 12:12 PM, Haribabu kommi
> >> <haribabu.kommi@huawei.com> wrote:
> >> > On 19 November 2013 10:33 Amit Kapila wrote:
> >> >> If I understood correctly, then your patch's main intention is to
> >> >> correct the estimate of dead tuples, so that it can lead to
> Vacuum
> >> >> cleaning the table/index which otherwise is not happening as per
> >> >> configuration value (autovacuum_vacuum_threshold) in some of the
> >> >> cases, also it is not reducing the complete bloat (Unpatched -
> >> 1532MB
> >> >> ~Patched   - 1474MB), as the main reason of bloat is extra space
> in
> >> >> index which can be reclaimed by reindex operation.
> >> >>
> >> >> So if above is correct then this patch has 3 advantages:
> >> >> a. Extra Vacuum on table/index due to better estimation of dead
> >> tuples.
> >> >> b. Space reclaim due to this extra vacuum c. may be some
> >> >> performance advantage as it will avoid the delay in cleaning dead
> >> >> tuples
> >> >>
> >> >> I think better way to test the patch is to see how much benefit
> is
> >> >> there due to above (a and b points) advantages. Different values
> >> >> of autovacuum_vacuum_threshold can be used to test.
> >> >
> >> >
> >> > The performance effect of the patch is not much visible as I think
> >> the
> >> > analyze on the table estimates the number of dead tuples of the
> >> > table
> >> with some estimation.
> >>
> >>    Yes, that seems to be the reason why you are not seeing any
> >> performance benefit, but still I think this is useful optimization
> to
> >> do, as
> >>    analyze updates both the livetuples and dead tuples and similarly
> >> vacuum should also update both the counts. Do you see any reason
> >>    why Vacuum should only update live tuples and not deadtuples?
> >
> > As vacuum touches all the pages where the dead tuples are present.
> > This is not the Same with analyzer. Because of this reason, the
> analyzer estimates the dead tuples also.
> > With the proposed patch the vacuum also estimates the dead tuples.
>
> Few questions about your latest patch:
> a. Is there any reason why you are doing estimation of dead tuples only
> for Autovacuum and not for Vacuum.

No, changed.

> /* clear and get the new stats for calculating proper dead tuples */
> pgstat_clear_snapshot(); tabentry =
> pgstat_fetch_stat_tabentry(RelationGetRelid(onerel));
> b. In the above code, to get latest data you are first clearing
> snapshot and then calling pgstat function. It will inturn perform I/O
>     (read of stats file) and send/receive message from stats collector
> to ensure it can read latest data. I think it will add overhead
>     to Vacuum, especially if 'nkeep' calculated in function
> lazy_scan_heap() can serve the purpose. In my simple test[1], I
> observed
>     that value of keep can serve the purpose.
>
> Can you please once try the test on 'nkeep' approach patch.

Using the nkeep and snapshot approach, I ran the test for 40 mins with a
high analyze_threshold and results are below.

                    Auto vacuum count        Bloat size
Master              11                      220MB
Patched_nkeep       14                      215MB
Patched_snapshot       18                   198MB

Both the approaches are showing good improvement in the test.
Updated patches, test script and configuration is attached in the mail.

Regards,
Hari babu.


Attachment

pgsql-hackers by date:

Previous
From: Peter Eisentraut
Date:
Subject: commit fest 2013-11 week 2 report
Next
From: Rajeev rastogi
Date:
Subject: Re: COPY table FROM STDIN doesn't show count tag