Re: Heavily modified big table bloat even in auto vacuum is running - Mailing list pgsql-hackers

From Amit Kapila
Subject Re: Heavily modified big table bloat even in auto vacuum is running
Date
Msg-id CAA4eK1K6tsLCT9hAxV0iKw8nsaz-UTVuyc27j4HO86S0rAEiXQ@mail.gmail.com
Whole thread Raw
In response to Re: Heavily modified big table bloat even in auto vacuum is running  (Haribabu kommi <haribabu.kommi@huawei.com>)
Responses Re: Heavily modified big table bloat even in auto vacuum is running
List pgsql-hackers
On Tue, Oct 22, 2013 at 2:09 PM, Haribabu kommi
<haribabu.kommi@huawei.com> wrote:
> On 22 October 2013 10:15 Amit Kapila wrote:
>>>On Mon, Oct 21, 2013 at 10:54 AM, Haribabu kommi <haribabu.kommi@huawei.com> wrote:
>>>
>>Actually what I had in mind was to use nkeep to estimate n_dead_tuples similar to how num_tuples is used to estimate
n_live_tuples.I think it will match what Tom had pointed in his response (>>>>What
 
>>>would make more sense to me is for VACUUM to estimate the number
>>>>>>of remaining dead tuples somehow and send that in its message.
>>>>>>However, since the whole point here is that we aren't accounting for
>>>>>>transactions that commit while VACUUM runs, it's not very clear how
>>>>>>to do that.)
>
> I changed the patch as passing the "nkeep" counter data as the new dead tuples in the relation to stats like the
new_rel_tuples.
> The "nkeep" counter is an approximation of dead tuples data of a relation.
> Instead of resetting dead tuples stats as zero, used this value to set n_dead_tuples same as n_live_tuples.

Directly using nkeep might not work as it is not guaranteed that
Vacuum will scan all the pages, we need to estimate the value similar
to new_rel_tuples, something like as done in below function:

/* now we can compute the new value for pg_class.reltuples */
vacrelstats->new_rel_tuples = vac_estimate_reltuples(onerel, false,
       nblocks,
       vacrelstats->scanned_pages,
       num_tuples);

I am not sure whether the same calculation as done for new_rel_tuples
works for new_dead_tuples, you can once check it.

I am thinking that if we have to do estimation anyway, then wouldn't
it be better to do the way Tom had initially suggested (Maybe we could
have VACUUM copy the n_dead_tuples value as it exists when VACUUM
starts, and then send that as the value to subtract when it's done?)

I think the reason you gave that due to tuple visibility check the
number of dead tuples calculated by above logic is not accurate is
right but still it will make the value of dead tuples more appropriate
than it's current value.

You can check if there is a way to do estimation of dead tuples
similar to new tuples, and it will be as solid as current logic of
vac_estimate_reltuples(), then it's okay, otherwise use the other
solution (using the value of n_dead_tuples at start of Vacuum) to
solve the problem.

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



pgsql-hackers by date:

Previous
From: Michael Paquier
Date:
Subject: Re: Documentation patch for date/time formatting functions
Next
From: David Rowley
Date:
Subject: patch to fix unused variable warning on windows build