Re: Reduce maximum error in tuples estimation after vacuum. - Mailing list pgsql-hackers

From Kyotaro HORIGUCHI
Subject Re: Reduce maximum error in tuples estimation after vacuum.
Date
Msg-id 20130703.165122.139640784.horiguchi.kyotaro@lab.ntt.co.jp
Whole thread Raw
In response to Re: Reduce maximum error in tuples estimation after vacuum.  (Amit Kapila <amit.kapila@huawei.com>)
Responses Re: Reduce maximum error in tuples estimation after vacuum.
List pgsql-hackers
Hello,

> I could see the same output with your latest script, also I could reproduce
> the test if I run the test with individual sql statements.
> One of the main point for reproducing individual test was to keep autovacuum
> = off.

I see. Autovacuum's nap time is 60 sconds for the default
settings. Your operation might help it to snipe the window
between the last massive delete and the next explict vacuum in
store_result().. Anyway setting autovacuum to off should aid to
make clean environment fot this issue.

> Now I can look into it further, I have still not gone through in detail
> about your new approach to calculate the reltuples, but I am wondering
> whether there can be anyway with which estimates can be improved with
> different calculation in vac_estimate_reltuples().

I'll explain this in other words alghough It might be
repetitious.

It is tough to decide how to modify there. Currently I decided to
preserve vac_estimate_reltuples as possible as it is. For that
objective, I picked up old_rel_tuples as intermediate variable
for the aid to 'deceive' the function. This can be different form
deciding to separate this estimation function from that for
analyze.

As I described before, vac_estimates_reltuples has a presumption
that the tuple density in skipped pages is not so different from
that in whole table before vacuuming. Since the density is
calculated without using any hint about the skipped pages, and it
cannot tell how much tuples aganst pg_class.reltuples is already
dead, the value can be far different from the true one and cannot
be verified. Given that we canot use
pg_stat_user_tables.n_dead_tup, reading all pages can fix it but
the penalty should be intolerable.

Using FSM to know the used bytes in skipped pages (which is all
visible by the definition) seems to give good estimations of the
tuples in the skipped pages to some extent assuming the
uniformity of tuple length. Of course strong deviation in length
can deceive the algorithm.

Does it make sense for you?

I might could show the numerical explanation but I'm afraind I
can't do it for now. I'll be able to take time sooner... (also
for reviewing..)

> One thing I have observed that 2nd parameter is_analyze of
> vac_estimate_reltuples() is currently not used.

Mmm, it seems to have been useless from the beginning of the
function...

> I cannot work on it till early next week, so others are welcome to join


regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



pgsql-hackers by date:

Previous
From: "Etsuro Fujita"
Date:
Subject: Re: Patch: clean up addRangeTableEntryForFunction
Next
From: Atri Sharma
Date:
Subject: Re: A better way than tweaking NTUP_PER_BUCKET