Home > mailing lists

Re: Reduce maximum error in tuples estimation after vacuum. - Mailing list pgsql-hackers

From	Amit Kapila
Subject	Re: Reduce maximum error in tuples estimation after vacuum.
Date	July 3, 2013 08:24:24
Msg-id	000601ce77ad$7d3388e0$779a9aa0$@kapila@huawei.com Whole thread Raw
In response to	Re: Reduce maximum error in tuples estimation after vacuum. (Amit Kapila <amit.kapila@huawei.com>)
List	pgsql-hackers

Tree view

On Thursday, June 27, 2013 4:58 PM Amit Kapila wrote:
> On Wednesday, June 26, 2013 7:40 AM Kyotaro HORIGUCHI wrote:
> > I've recovered from messing up.
> >
> > <snip>
> > > Please let me have a bit of time to diagnose this.
> >
> > I was completely messed up and walking on the wrong way. I looked
> into
> > the vacuum for UPDATEs, not DELETE's so it's quite resonable to have
> > such results.
> >
> > The renewed test script attached shows the verbose output of vacuum
> > after the deletes. I had following output from it.
> >
> > # I belive this runs for you..
> >
> > | INFO: "t": found 989999 removable, 110 nonremovable row
> > |       versions in 6308 out of 10829 pages
> >
> > On such a case of partially-scanned, lazy_scan_heap() tries to
> estimate
> > resulting num_tuples in vac_estimate_reltuples() assuming the
> > uniformity of tuple density, which failes for such a a strong
> imbalance
> > made by bulk updates.
> >
> > Do you find any differences between what you will have and the
> > following I had?
> 
> I could see the same output with your latest script, also I could
> reproduce
> the test if I run the test with individual sql statements.
> One of the main point for reproducing individual test was to keep
> autovacuum
> = off.

I checked further that why I could not reproduce the issue with
autovacuum=on.
The reason is that it starts analyzer which changes the value for reltuples
in pg_class and after that the estimated and real values become same.
Kindly refer below code:

relation_needs_vacanalyze() 
{ 
.. 
anltuples = tabentry->changes_since_analyze; 
.. 
anlthresh = (float4) anl_base_thresh + anl_scale_factor * reltuples;
.. 
*doanalyze = (anltuples > anlthresh);
}

Test Results 
--------------

postgres=# drop table if exists t; 
DROP TABLE 
postgres=# create table t (a int, b int, c int, d int default 0, e int
default 0 
, f int default 0); 
CREATE TABLE 
postgres=# insert into t (select a, (random() * 100000)::int from
generate_serie 
s((select count(*) from t) + 1, 1000000) a); 
INSERT 0 1000000 
postgres=# update t set b = b + 1 where a <  (select count(*) from t) * 0.7;

UPDATE 699999 
postgres=# vacuum t; 
VACUUM 
postgres=# delete from t where a < (select count(*) from t) * 0.99; 
DELETE 989999 
postgres=# vacuum t; 
VACUUM 
postgres=# select c.relpages, s.n_live_tup, c.reltuples, (select count(*)
from t 
) as tuples, reltuples::float / (select count(*) from t) as ratio  from
pg_stat_ 
user_tables s, pg_class c where s.relname = 't' and c.relname = 't'; relpages | n_live_tup | reltuples | tuples |
ratio
 
----------+------------+-----------+--------+------------------     6370 |     417600 |    417600 |  10001 |
41.7558244175582
 
(1 row) 

Here I waited for 1 minute (sufficient time so that analyzer should get
trigger if required).
Infact if you run Analyze t, that also would have served the purpose.

postgres=# select c.relpages, s.n_live_tup, c.reltuples, (select count(*)
from t 
) as tuples, reltuples::float / (select count(*) from t) as ratio  from
pg_stat_ 
user_tables s, pg_class c where s.relname = 't' and c.relname = 't'; relpages | n_live_tup | reltuples | tuples | ratio

----------+------------+-----------+--------+-------     6370 |      10001 |     10001 |  10001 |     1 
(1 row)


Now if subsequent analyzer run corrects the estimate, don't you think that
it is sufficient for the problem reported?

With Regards,
Amit Kapila.

pgsql-hackers by date:

From: Josh Berkus
Date: 03 July 2013, 07:42:56
Subject: Re: [9.4 CF 1] The Commitfest Slacker List

From: Simon Riggs
Date: 03 July 2013, 09:12:37
Subject: Re: ALTER SYSTEM SET command to change postgresql.conf parameters (RE: Proposal for Allow postgresql.conf values to be changed via SQL [review])

Re: Reduce maximum error in tuples estimation after vacuum. - Mailing list pgsql-hackers

Previous

Next