Re: Heavily modified big table bloat even in auto vacuum is running - Mailing list pgsql-hackers

From Amit Kapila
Subject Re: Heavily modified big table bloat even in auto vacuum is running
Date
Msg-id CAA4eK1Lg+v_r1VdH68qyzHzfQjbfjotA=UurTGjW-bMnUKrORA@mail.gmail.com
Whole thread Raw
In response to Re: Heavily modified big table bloat even in auto vacuum is running  (Haribabu kommi <haribabu.kommi@huawei.com>)
Responses Re: Heavily modified big table bloat even in auto vacuum is running  (Haribabu kommi <haribabu.kommi@huawei.com>)
List pgsql-hackers
On Tue, Oct 15, 2013 at 3:37 PM, Haribabu kommi
<haribabu.kommi@huawei.com> wrote:
> On 12 October 2013 11:30 Tom Lane wrote:
>>Haribabu kommi <haribabu.kommi@huawei.com> writes:
>>> To handle the above case instead of directly resetting the dead tuples
>>> as zero, how if the exact dead tuples are removed from the table stats. With this approach vacuum gets triggered
frequentlythus it reduces the bloat.
 
>
>>This does not seem like a very good idea as-is, because it will mean that n_dead_tuples can diverge arbitrarily far
fromreality over time, as a result of accumulation of errors.  It also doesn't seem
 
>>like a very good idea that VACUUM sets n_live_tuples while only adjusting n_dead_tuples incrementally; ideally those
countersshould move in the same fashion.
 
>>In short, I think this patch will create at least as many problems as it fixes.
>
>>What would make more sense to me is for VACUUM to estimate the number of remaining dead tuples somehow and send that
inits message.  However, since the whole point here is that we aren't accounting for
 
>>transactions that commit while VACUUM runs, it's not very clear how to do that.
>
>>Another way to look at it is that we want to keep any increments to n_dead_tuples that occur after VACUUM takes its
snapshot. Maybe we could have VACUUM copy the n_dead_tuples value as it exists when
 
>>VACUUM starts, and then send that as the value to subtract when it's done?
>
> Taking of n_dead_tuples copy and pass the same at the vacuum end to subtract from table stats may not be correct, as
vacuummay not be cleaned all the dead tuples because of tuple visibility
 
> To other transactions. How about resets the n_dead_tuples as zero if it goes negative because of errors?

Wouldn't the way you are planing to change n_dead_tuples create
inconsistency for n_live_tuples and n_dead_tuples, because it would
have counted non deleted tuples as n_live_tuples as per below code:

if (tupgone)
{
..
tups_vacuumed += 1;
has_dead_tuples = true;
}
else
{
num_tuples += 1;
hastup = true;
..
}

So now if we just subtract tuples_deleted from n_dead_tuples, it will
count the tuples deleted during vacuum both as live tuples and dead
tuples.
There is one statistics for dead row version's that cannot be removed
(nkeep), if we could use that to estimate total remaining dead tuples,
then the solution can be inline with Tom's suggestion (What would make
more sense to me is for VACUUM to estimate the number of remaining
dead tuples somehow and send that in its message.).


With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



pgsql-hackers by date:

Previous
From: Noah Misch
Date:
Subject: Re: [PATCH] Statistics collection for CLUSTER command
Next
From: Bruce Momjian
Date:
Subject: Re: GIN improvements part 1: additional information