Re: Heavily modified big table bloat even in auto vacuum is running - Mailing list pgsql-hackers

From Amit Kapila
Subject Re: Heavily modified big table bloat even in auto vacuum is running
Date
Msg-id CAA4eK1KpnqQb4R7BMiVHEyjivbch8wzniYSDQ-YAAFBi77Km6A@mail.gmail.com
Whole thread Raw
In response to Re: Heavily modified big table bloat even in auto vacuum is running  (Haribabu kommi <haribabu.kommi@huawei.com>)
Responses Re: Heavily modified big table bloat even in auto vacuum is running
List pgsql-hackers
On Fri, Nov 29, 2013 at 6:55 PM, Haribabu kommi
<haribabu.kommi@huawei.com> wrote:
> On 29 November 2013 12:00 Amit Kapila wrote:
>> On Tue, Nov 26, 2013 at 7:26 PM, Haribabu kommi
>> <haribabu.kommi@huawei.com> wrote:
>> Few questions about your latest patch:
>> a. Is there any reason why you are doing estimation of dead tuples only
>> for Autovacuum and not for Vacuum.
>
> No, changed.
>
>> /* clear and get the new stats for calculating proper dead tuples */
>> pgstat_clear_snapshot(); tabentry =
>> pgstat_fetch_stat_tabentry(RelationGetRelid(onerel));
>> b. In the above code, to get latest data you are first clearing
>> snapshot and then calling pgstat function. It will inturn perform I/O
>>     (read of stats file) and send/receive message from stats collector
>> to ensure it can read latest data. I think it will add overhead
>>     to Vacuum, especially if 'nkeep' calculated in function
>> lazy_scan_heap() can serve the purpose. In my simple test[1], I
>> observed
>>     that value of keep can serve the purpose.
>>
>> Can you please once try the test on 'nkeep' approach patch.
>
> Using the nkeep and snapshot approach, I ran the test for 40 mins with a
> high analyze_threshold and results are below.
>
>                         Auto vacuum count               Bloat size
> Master           11                      220MB
> Patched_nkeep      14                      215MB
> Patched_snapshot           18                              198MB
>
> Both the approaches are showing good improvement in the test.
> Updated patches, test script and configuration is attached in the mail.

I think updating dead tuple count using nkeep is good idea as similar
thing is done for Analyze as well in acquire_sample_rows().
One minor point, I think it is better to log dead tuples is below error message:
ereport(LOG,
(errmsg("automatic vacuum of table \"%s.%s.%s\": index scans: %d\n"
"pages: %d removed, %d remain\n"
"tuples: %.0f removed, %.0f remain\n"

"tuples: %.0f removed, %.0f remain, %.0f dead\n"


About your test, how to collect the data by running this script, are
you manually stopping it after 40 mins, because I ran it for more than
an hour,
the final result didn't came.
As I mentioned you last time, please simplify your test, for other
person in its current form, it is difficult to make meaning out of it.
Write comments on top of it in steps form to explain what exactly it
is doing and how to take data using it (for example, do I need
to wait, till script ends; how long this test can take to complete).

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



pgsql-hackers by date:

Previous
From: Peter Eisentraut
Date:
Subject: Re: [PATCH] Add transforms feature
Next
From: Haribabu kommi
Date:
Subject: Re: Performance Improvement by reducing WAL for Update Operation