Re: Reduce maximum error in tuples estimation after vacuum. - Mailing list pgsql-hackers
From | Amit Kapila |
---|---|
Subject | Re: Reduce maximum error in tuples estimation after vacuum. |
Date | |
Msg-id | 000601ce77ad$7d3388e0$779a9aa0$@kapila@huawei.com Whole thread Raw |
In response to | Re: Reduce maximum error in tuples estimation after vacuum. (Amit Kapila <amit.kapila@huawei.com>) |
List | pgsql-hackers |
On Thursday, June 27, 2013 4:58 PM Amit Kapila wrote: > On Wednesday, June 26, 2013 7:40 AM Kyotaro HORIGUCHI wrote: > > I've recovered from messing up. > > > > <snip> > > > Please let me have a bit of time to diagnose this. > > > > I was completely messed up and walking on the wrong way. I looked > into > > the vacuum for UPDATEs, not DELETE's so it's quite resonable to have > > such results. > > > > The renewed test script attached shows the verbose output of vacuum > > after the deletes. I had following output from it. > > > > # I belive this runs for you.. > > > > | INFO: "t": found 989999 removable, 110 nonremovable row > > | versions in 6308 out of 10829 pages > > > > On such a case of partially-scanned, lazy_scan_heap() tries to > estimate > > resulting num_tuples in vac_estimate_reltuples() assuming the > > uniformity of tuple density, which failes for such a a strong > imbalance > > made by bulk updates. > > > > Do you find any differences between what you will have and the > > following I had? > > I could see the same output with your latest script, also I could > reproduce > the test if I run the test with individual sql statements. > One of the main point for reproducing individual test was to keep > autovacuum > = off. I checked further that why I could not reproduce the issue with autovacuum=on. The reason is that it starts analyzer which changes the value for reltuples in pg_class and after that the estimated and real values become same. Kindly refer below code: relation_needs_vacanalyze() { .. anltuples = tabentry->changes_since_analyze; .. anlthresh = (float4) anl_base_thresh + anl_scale_factor * reltuples; .. *doanalyze = (anltuples > anlthresh); } Test Results -------------- postgres=# drop table if exists t; DROP TABLE postgres=# create table t (a int, b int, c int, d int default 0, e int default 0 , f int default 0); CREATE TABLE postgres=# insert into t (select a, (random() * 100000)::int from generate_serie s((select count(*) from t) + 1, 1000000) a); INSERT 0 1000000 postgres=# update t set b = b + 1 where a < (select count(*) from t) * 0.7; UPDATE 699999 postgres=# vacuum t; VACUUM postgres=# delete from t where a < (select count(*) from t) * 0.99; DELETE 989999 postgres=# vacuum t; VACUUM postgres=# select c.relpages, s.n_live_tup, c.reltuples, (select count(*) from t ) as tuples, reltuples::float / (select count(*) from t) as ratio from pg_stat_ user_tables s, pg_class c where s.relname = 't' and c.relname = 't'; relpages | n_live_tup | reltuples | tuples | ratio ----------+------------+-----------+--------+------------------ 6370 | 417600 | 417600 | 10001 | 41.7558244175582 (1 row) Here I waited for 1 minute (sufficient time so that analyzer should get trigger if required). Infact if you run Analyze t, that also would have served the purpose. postgres=# select c.relpages, s.n_live_tup, c.reltuples, (select count(*) from t ) as tuples, reltuples::float / (select count(*) from t) as ratio from pg_stat_ user_tables s, pg_class c where s.relname = 't' and c.relname = 't'; relpages | n_live_tup | reltuples | tuples | ratio ----------+------------+-----------+--------+------- 6370 | 10001 | 10001 | 10001 | 1 (1 row) Now if subsequent analyzer run corrects the estimate, don't you think that it is sufficient for the problem reported? With Regards, Amit Kapila.
pgsql-hackers by date: