Re: Heavily modified big table bloat even in auto vacuum is running - Mailing list pgsql-hackers

From Amit Kapila
Subject Re: Heavily modified big table bloat even in auto vacuum is running
Date
Msg-id CAA4eK1KwrvP2TggtZaCBx9AuZ5e+4an0V5fC3zktarWMUfPkQw@mail.gmail.com
Whole thread Raw
In response to Re: Heavily modified big table bloat even in auto vacuum is running  (Haribabu kommi <haribabu.kommi@huawei.com>)
Responses Re: Heavily modified big table bloat even in auto vacuum is running  (Haribabu kommi <haribabu.kommi@huawei.com>)
List pgsql-hackers
On Fri, Nov 8, 2013 at 10:56 AM, Haribabu kommi
<haribabu.kommi@huawei.com> wrote:
> On 07 November 2013 09:42 Amit Kapila wrote:
>> I am not sure whether the same calculation as done for new_rel_tuples
>> works for new_dead_tuples, you can once check it.
>
> I didn't find any way to calculate new_dead_tuples like new_rel_tuples.
> I will check it.
>
>> I am thinking that if we have to do estimation anyway, then wouldn't it
>> be better to do the way Tom had initially suggested (Maybe we could
>> have VACUUM copy the n_dead_tuples value as it exists when VACUUM
>> starts, and then send that as the value to subtract when it's done?)
>>
>> I think the reason you gave that due to tuple visibility check the
>> number of dead tuples calculated by above logic is not accurate is
>> right but still it will make the value of dead tuples more appropriate
>> than it's current value.
>>
>> You can check if there is a way to do estimation of dead tuples similar
>> to new tuples, and it will be as solid as current logic of
>> vac_estimate_reltuples(), then it's okay, otherwise use the other
>> solution (using the value of n_dead_tuples at start of Vacuum) to solve
>> the problem.
>
> The two approaches calculations are approximation values only.
>
> 1. Taking a copy of n_dead_tuples before VACUUM starts and then subtract it once it is done.
>    This approach doesn't include the tuples which are remains during the vacuum operation.
     Wouldn't next or future vacuum's will make the estimate more appropraite?

> 2. nkeep counter contains the tuples which are still visible to other transactions.
>    This approach doesn't include tuples which are deleted on pages where vacuum operation is already finished.
>
> In my opinion the second approach gives the value nearer to the actual value,
> because it includes some of the new dead tuples also. Please correct me if anything wrong in my analysis.  I think
mainproblem in nkeep logic is to come up with an
 
estimation algorithm similar to live tuples.

By the way, do you have test case or can you try to write a test case
which can show this problem and
then after fix, you can verify if the problem is resolved.


With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



pgsql-hackers by date:

Previous
From: "Etsuro Fujita"
Date:
Subject: Improve code in tidbitmap.c
Next
From: Dilip kumar
Date:
Subject: Re: TODO : Allow parallel cores to be used by vacuumdb [ WIP ]