Re: Heavily modified big table bloat even in auto vacuum is running - Mailing list pgsql-hackers

From Haribabu kommi
Subject Re: Heavily modified big table bloat even in auto vacuum is running
Date
Msg-id 8977CB36860C5843884E0A18D8747B0372BD78B4@szxeml558-mbs.china.huawei.com
Whole thread Raw
In response to Re: Heavily modified big table bloat even in auto vacuum is running  (Amit Kapila <amit.kapila16@gmail.com>)
Responses Re: Heavily modified big table bloat even in auto vacuum is running
List pgsql-hackers
On 07 November 2013 09:42 Amit Kapila wrote:
> On Tue, Oct 22, 2013 at 2:09 PM, Haribabu kommi
> <haribabu.kommi@huawei.com> wrote:
> > On 22 October 2013 10:15 Amit Kapila wrote:
> >>>On Mon, Oct 21, 2013 at 10:54 AM, Haribabu kommi
> <haribabu.kommi@huawei.com> wrote:
> >>>
> >>Actually what I had in mind was to use nkeep to estimate
> n_dead_tuples
> >>similar to how num_tuples is used to estimate n_live_tuples. I think
> >>it will match what Tom had pointed in his response (>>>>What
> >>>would make more sense to me is for VACUUM to estimate the number
> >>>>>>of remaining dead tuples somehow and send that in its message.
> >>>>>>However, since the whole point here is that we aren't accounting
> >>>>>>for transactions that commit while VACUUM runs, it's not very
> >>>>>>clear how to do that.)
> >
> > I changed the patch as passing the "nkeep" counter data as the new
> dead tuples in the relation to stats like the new_rel_tuples.
> > The "nkeep" counter is an approximation of dead tuples data of a
> relation.
> > Instead of resetting dead tuples stats as zero, used this value to
> set n_dead_tuples same as n_live_tuples.
>
> Directly using nkeep might not work as it is not guaranteed that Vacuum
> will scan all the pages, we need to estimate the value similar to
> new_rel_tuples, something like as done in below function:
>
> /* now we can compute the new value for pg_class.reltuples */
> vacrelstats->new_rel_tuples = vac_estimate_reltuples(onerel, false,
>
>         nblocks,
>
>         vacrelstats->scanned_pages,
>
>         num_tuples);
>
> I am not sure whether the same calculation as done for new_rel_tuples
> works for new_dead_tuples, you can once check it.

I didn't find any way to calculate new_dead_tuples like new_rel_tuples.
I will check it.
> I am thinking that if we have to do estimation anyway, then wouldn't it
> be better to do the way Tom had initially suggested (Maybe we could
> have VACUUM copy the n_dead_tuples value as it exists when VACUUM
> starts, and then send that as the value to subtract when it's done?)
>
> I think the reason you gave that due to tuple visibility check the
> number of dead tuples calculated by above logic is not accurate is
> right but still it will make the value of dead tuples more appropriate
> than it's current value.
>
> You can check if there is a way to do estimation of dead tuples similar
> to new tuples, and it will be as solid as current logic of
> vac_estimate_reltuples(), then it's okay, otherwise use the other
> solution (using the value of n_dead_tuples at start of Vacuum) to solve
> the problem.

The two approaches calculations are approximation values only.

1. Taking a copy of n_dead_tuples before VACUUM starts and then subtract it once it is done.  This approach doesn't
includethe tuples which are remains during the vacuum operation. 

2. nkeep counter contains the tuples which are still visible to other transactions.  This approach doesn't include
tupleswhich are deleted on pages where vacuum operation is already finished. 

In my opinion the second approach gives the value nearer to the actual value,
because it includes some of the new dead tuples also. Please correct me if anything wrong in my analysis.

Regards,
Hari babu.



pgsql-hackers by date:

Previous
From: Craig Ringer
Date:
Subject: Re: Changing pg_dump default file format
Next
From: Jan Lentfer
Date:
Subject: Re: TODO : Allow parallel cores to be used by vacuumdb [ WIP ]