Re: Heavily modified big table bloat even in auto vacuum is running - Mailing list pgsql-hackers

From Amit Kapila
Subject Re: Heavily modified big table bloat even in auto vacuum is running
Date
Msg-id CAA4eK1Kh1OXrGo20Na=3ZiRrui7KfPseXBxJ67pAmH6CueVBng@mail.gmail.com
Whole thread Raw
In response to Re: Heavily modified big table bloat even in auto vacuum is running  (Haribabu kommi <haribabu.kommi@huawei.com>)
Responses Re: Heavily modified big table bloat even in auto vacuum is running  (Haribabu kommi <haribabu.kommi@huawei.com>)
List pgsql-hackers
On Tue, Nov 26, 2013 at 7:26 PM, Haribabu kommi
<haribabu.kommi@huawei.com> wrote:
> On 25 November 2013 10:43 Amit Kapila wrote:
>> On Fri, Nov 22, 2013 at 12:12 PM, Haribabu kommi
>> <haribabu.kommi@huawei.com> wrote:
>> > On 19 November 2013 10:33 Amit Kapila wrote:
>> >> If I understood correctly, then your patch's main intention is to
>> >> correct the estimate of dead tuples, so that it can lead to Vacuum
>> >> cleaning the table/index which otherwise is not happening as per
>> >> configuration value (autovacuum_vacuum_threshold) in some of the
>> >> cases, also it is not reducing the complete bloat (Unpatched -
>> 1532MB
>> >> ~Patched   - 1474MB), as the main reason of bloat is extra space in
>> >> index which can be reclaimed by reindex operation.
>> >>
>> >> So if above is correct then this patch has 3 advantages:
>> >> a. Extra Vacuum on table/index due to better estimation of dead
>> tuples.
>> >> b. Space reclaim due to this extra vacuum c. may be some performance
>> >> advantage as it will avoid the delay in cleaning dead tuples
>> >>
>> >> I think better way to test the patch is to see how much benefit is
>> >> there due to above (a and b points) advantages. Different values of
>> >> autovacuum_vacuum_threshold can be used to test.
>> >
>> >
>> > The performance effect of the patch is not much visible as I think
>> the
>> > analyze on the table estimates the number of dead tuples of the table
>> with some estimation.
>>
>>    Yes, that seems to be the reason why you are not seeing any
>> performance benefit, but still I think this is useful optimization to
>> do, as
>>    analyze updates both the livetuples and dead tuples and similarly
>> vacuum should also update both the counts. Do you see any reason
>>    why Vacuum should only update live tuples and not deadtuples?
>
> As vacuum touches all the pages where the dead tuples are present. This is not the
> Same with analyzer. Because of this reason, the analyzer estimates the dead tuples also.
> With the proposed patch the vacuum also estimates the dead tuples.

Few questions about your latest patch:
a. Is there any reason why you are doing estimation of dead tuples
only for Autovacuum and not for Vacuum.

/* clear and get the new stats for calculating proper dead tuples */
pgstat_clear_snapshot();
tabentry = pgstat_fetch_stat_tabentry(RelationGetRelid(onerel));
b. In the above code, to get latest data you are first clearing
snapshot and then calling pgstat function. It will inturn perform I/O   (read of stats file) and send/receive message
fromstats collector
 
to ensure it can read latest data. I think it will add overhead   to Vacuum, especially if 'nkeep' calculated in
function
lazy_scan_heap() can serve the purpose. In my simple test[1], I
observed   that value of keep can serve the purpose.

Can you please once try the test on 'nkeep' approach patch.

>> > Because of this reason not much performance improvement is not
>> visible
>> > as the missed dead tuple calculation in vacuum is covered by the
>> analyze.
>>
>>    Yeah, so might be we can check once by configuring
>> analyze_threshold/scalefactor in a way that analyze doesn't get trigger
>> during your test.
>
> I ran the test for one hour with a high analyze_threshold and results are below.
>
>                 Auto vacuum count               Bloat size
> Master          15                      155MB
> Patched         23                      134MB
>
> Updated test script and configuration is attached in the mail.

I just had a brief look on your test, please check if you can simplify
your script file and make the test results to come in 15~20 mins.
Don't put too much effort on it, if you can do it easily then it is okay.

[1] Simple test case to verify the value of dead tuples:
Session-1
-----------------
a. Create table t1(c1 int);
b. insert into t1 values(generate_series(1,1000));
c. delete from t1;
d. Vacuum t1;
-- here I stopped in debugger, after fetching dead tuple count first
time (line 235, vacuumlazy.c, after applying your patch) as per your
code (modified a bit so that I can get the value for Vacuum)

Session-2
-----------------
a. insert into t1 values (generate_series(1000,1500));
b. delete from t1;

Session -1
-----------------
b. Verified the value of nkeep in lazy_scan_heap(), it is 501 which is
what we expect.


With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



pgsql-hackers by date:

Previous
From: Robert Haas
Date:
Subject: Re: logical changeset generation v6.7
Next
From: Amit Kapila
Date:
Subject: Re: TODO: Split out pg_resetxlog output into pre- and post-sections