Reduce maximum error in tuples estimation after vacuum. - Mailing list pgsql-hackers
From | Kyotaro HORIGUCHI |
---|---|
Subject | Reduce maximum error in tuples estimation after vacuum. |
Date | |
Msg-id | 20130614.173528.146929812.horiguchi.kyotaro@lab.ntt.co.jp Whole thread Raw |
Responses |
Re: Reduce maximum error in tuples estimation after vacuum.
Re: Reduce maximum error in tuples estimation after vacuum. |
List | pgsql-hackers |
Hello, Postgresql estimates the number of live tuples after the vacuum has left some buffers unscanned. This estimation does well for most cases, but makes completely different result with a strong imbalance of tuple density. For example, create table t (a int, b int); insert into t (select a, (random() * 100000)::int from generate_series((select count(*) from t) + 1, 1000000) a); update t set b = b + 1 where a < (select count(*) from t) * 0.7; vacuum t; delete from t where a < (select count(*) from t) * 0.99; After this, pg_stat_user_tables.n_live_tup shows 417670 which is 41 times larger than the real number of rows 100001. And what makes it worse, autovacuum nor autoanalyze won't run until n_dead_tup goes above 8 times larger than the real number of tuples in the table for the default settings.. | postgres=# select n_live_tup, n_dead_tup | from pg_stat_user_tables where relname='t'; | n_live_tup | n_dead_tup | ------------+------------ | 417670 | 0 | | postgres=# select reltuples from pg_class where relname='t'; | reltuples | ----------- | 417670 | | postgres=# select count(*) from t; | count | ------- | 10001 Using n_dead_tup before vacuuming seems to make it better but I heard that the plan is abandoned from some reason I don't know. So I've come up with the another plan - using FSM to estimate the tuple density in unscanned pages. The point is that make estimation reliying on the uniformity of tuple length instead of tuple density. This change seems keeping that errors under a few times of tuples. Additional page reads for FSM are about 4000th (SlotsPerFSMPage) of the skipped pages, and I suppose this is tolerable during vacuum. Overall algorithm could be illistrated as below, - summing up used bytes, max offnum(PageGetMaxOffsetNumber), maximum free bytes for tuple data , and free bytes after page vacuum through all scanned pages. - summing up free bytes informed by FSM through all skipped pages. - Calculate mean tuple length from the overall used bytes and sum of max offnums, and scanned pages. - Guess tuple density in skipped pages using overall free bytes from FSM and the mean tuple length calculated above. - Finally, feed estimated number of the live tuples BEFORE vacuum into vac_estimate_reltuples. Of course this method affected by the imbalance of tuple LENGTH, but it also seems to be kept within a few times of the number of tuples. for rows with invariable length, the test for head shows, where "tups est" is pg_class.reltuples and "tups real" is count(*). del% | pages | n_live_tup | tups est | tups real | est/real | bufs -----+-------+------------+----------+-----------+----------+------0.9 | 4425 | 100001 | 470626 | 100001 | 4.706| 3985 0.95 | 4425 | 50001 | 441196 | 50001 | 8.824 | 4206 0.99 | 4425 | 417670 | 417670 | 10001 | 41.763 | 4383 and with the patch 0.9 | 4425 | 106169 | 106169 | 100001 | 1.062 | 3985 0.95 | 4425 | 56373 | 56373 | 50001 | 1.127 | 4206 0.99 | 4425 | 10001 | 16535 | 10001 | 1.653 | 4383 What do you think about this? ===== The attached files are: - vacuum_est_improve_20130614.patch: the patch for this proposal - vactest.sql: sql script to cause the sitiation - vactest.sh: test script to find the errors relating this patch. - test_result.txt: all of the test result for various deletion ratio which the test script above yields. regards, -- Kyotaro Horiguchi NTT Open Source Software Center diff --git a/src/backend/commands/analyze.c b/src/backend/commands/analyze.c index d6d20fd..1e581c1 100644 --- a/src/backend/commands/analyze.c +++ b/src/backend/commands/analyze.c @@ -1280,7 +1280,8 @@ acquire_sample_rows(Relation onerel, int elevel, *totalrows = vac_estimate_reltuples(onerel, true, totalblocks, bs.m, - liverows); + liverows, + onerel->rd_rel->reltuples); if (bs.m > 0) *totaldeadrows = floor((deadrows/ bs.m) * totalblocks + 0.5); else diff --git a/src/backend/commands/vacuum.c b/src/backend/commands/vacuum.c index 641c740..4bdf0c1 100644 --- a/src/backend/commands/vacuum.c +++ b/src/backend/commands/vacuum.c @@ -501,10 +501,10 @@ doublevac_estimate_reltuples(Relation relation, bool is_analyze, BlockNumbertotal_pages, BlockNumber scanned_pages, - double scanned_tuples) + double scanned_tuples, + double old_rel_tuples){ BlockNumber old_rel_pages = relation->rd_rel->relpages; - double old_rel_tuples = relation->rd_rel->reltuples; double old_density; double new_density; double multiplier; diff --git a/src/backend/commands/vacuumlazy.c b/src/backend/commands/vacuumlazy.c index 7e46f9e..80304a6 100644 --- a/src/backend/commands/vacuumlazy.c +++ b/src/backend/commands/vacuumlazy.c @@ -396,7 +396,11 @@ lazy_scan_heap(Relation onerel, LVRelStats *vacrelstats, double num_tuples, tups_vacuumed, nkeep, - nunused; + nunused, + scanned_used_bytes, + scanned_possible_max_tups, + scanned_freespace, + total_freespace; IndexBulkDeleteResult **indstats; int i; PGRUsage ru0; @@ -414,6 +418,8 @@ lazy_scan_heap(Relation onerel, LVRelStats *vacrelstats, empty_pages = vacuumed_pages = 0; num_tuples= tups_vacuumed = nkeep = nunused = 0; + scanned_used_bytes = scanned_possible_max_tups = 0; + scanned_freespace = total_freespace = 0; indstats = (IndexBulkDeleteResult **) palloc0(nindexes * sizeof(IndexBulkDeleteResult*)); @@ -487,6 +493,10 @@ lazy_scan_heap(Relation onerel, LVRelStats *vacrelstats, bool has_dead_tuples; TransactionId visibility_cutoff_xid = InvalidTransactionId; + /* FSM slot for the previous block should be well mainteined */ + if (blkno > 0) + total_freespace += GetRecordedFreeSpace(onerel, blkno - 1); + if (blkno == next_not_all_visible_block) { /* Time to advance next_not_all_visible_block */ @@ -692,6 +702,7 @@ lazy_scan_heap(Relation onerel, LVRelStats *vacrelstats, hastup = false; prev_dead_count= vacrelstats->num_dead_tuples; maxoff = PageGetMaxOffsetNumber(page); + scanned_possible_max_tups += maxoff; /* * Note: If you change anything in the loop below, also lookat @@ -892,6 +903,8 @@ lazy_scan_heap(Relation onerel, LVRelStats *vacrelstats, } freespace = PageGetHeapFreeSpace(page); + scanned_freespace += freespace; + scanned_used_bytes += PageGetItemBytes(page); /* mark page all-visible, if appropriate */ if (all_visible&& !all_visible_according_to_vm) @@ -969,15 +982,45 @@ lazy_scan_heap(Relation onerel, LVRelStats *vacrelstats, RecordPageWithFreeSpace(onerel,blkno, freespace); } + total_freespace += GetRecordedFreeSpace(onerel, nblocks - 1); + /* save stats for use later */ vacrelstats->scanned_tuples = num_tuples; vacrelstats->tuples_deleted = tups_vacuumed; - /* now we can compute the new value for pg_class.reltuples */ - vacrelstats->new_rel_tuples = vac_estimate_reltuples(onerel, false, - nblocks, - vacrelstats->scanned_pages, - num_tuples); + { + double prev_ntuples = onerel->rd_rel->reltuples; + + if (tups_vacuumed > num_tuples && + (double)vacrelstats->scanned_pages / nblocks > 0.1) + { + /* + * When certain amount of tuples are vacuumed, there might be + * unignorable imballance in the number of tuples between scanned + * pages and skipped ones. So we guess the number of tuples on the + * skipped pages assuming the uniformity of tuple size among the + * whole relation, and the correctness of FSM at this point. + */ + BlockNumber scanned_pages = vacrelstats->scanned_pages; + BlockNumber skipped_pages = nblocks - scanned_pages; + double buffer_max_bytes = + (scanned_used_bytes + scanned_freespace) / scanned_pages; + double skipped_freespace = total_freespace - scanned_freespace; + double mean_tup_len = + buffer_max_bytes / (scanned_possible_max_tups / scanned_pages); + double skipped_tups_per_blk = + (buffer_max_bytes - skipped_freespace / skipped_pages) / + mean_tup_len; + prev_ntuples = skipped_tups_per_blk * nblocks; + } + + /* now we can compute the new value for pg_class.reltuples */ + vacrelstats->new_rel_tuples = vac_estimate_reltuples(onerel, false, + nblocks, + vacrelstats->scanned_pages, + num_tuples, + prev_ntuples); + } /* * Release any remaining pin on visibility map page. diff --git a/src/include/commands/vacuum.h b/src/include/commands/vacuum.h index d8dd8b0..aedc48e 100644 --- a/src/include/commands/vacuum.h +++ b/src/include/commands/vacuum.h @@ -147,7 +147,8 @@ extern void vac_close_indexes(int nindexes, Relation *Irel, LOCKMODE lockmode);extern double vac_estimate_reltuples(Relationrelation, bool is_analyze, BlockNumber total_pages, BlockNumber scanned_pages, - double scanned_tuples); + double scanned_tuples, + double old_rel_tuples);extern void vac_update_relstats(Relation relation, BlockNumbernum_pages, double num_tuples, diff --git a/src/include/storage/bufpage.h b/src/include/storage/bufpage.h index abcf8a0..aa067d9 100644 --- a/src/include/storage/bufpage.h +++ b/src/include/storage/bufpage.h @@ -233,6 +233,13 @@ typedef PageHeaderData *PageHeader; ((ItemId) (&((PageHeader) (page))->pd_linp[(offsetNumber) - 1]))/* + * PageGetItemBytes + * Returns the size in bytes occupied by items data on the page + */ +#define PageGetItemBytes(page) \ + (((PageHeader) (page))->pd_special - ((PageHeader) (page))->pd_upper) + +/* * PageGetContents * To be used in case the page does not contain item pointers. * drop table if exists t; create table t (a int, b int); insert into t (select a, (random() * 100000)::int from generate_series((select count(*) from t) + 1, 1000000) a); update t set b = b + 1 where a < (select count(*) from t) * 0.7; vacuum t; delete from t where a < (select count(*) from t) * 0.99; vacuum t; select c.relpages, s.n_live_tup, c.reltuples, (select count(*) from t) as tuples, reltuples::float / (select count(*) fromt) as ratio from pg_stat_user_tables s, pg_class c where s.relname = 't' and c.relname = 't'; #! /bin/sh dbname="postgres" function insert_result() {psql ${dbname} -f - > /dev/null <<EOF insert into result select $1, $2, $3, c.relpages, s.n_live_tup, c.reltuples, (select count(*) from t), reltuples::float /(select count(*) from t) from pg_stat_user_tables s, pg_class c where s.relname = 't' and c.relname = 't'; EOF } function vac_with_bufs() {psql ${dbname} -c "vacuum verbose t" |& egrep "INFO: *\"t\": found " | sed -e 's/^.* versionsin \([0-9]*\) .*$/\1/' } function update_result_bufs() {local test_no=$1local delratio=$2local vac_no=$3local bufs=$4 psql ${dbname} -c "update result set bufs=${bufs} where \"#\"=$test_no and \"del%\"=$delratio and \"##\"=$vac_no" >/dev/null } function store_result() {local test_no=$1local delratio=$2scanned_bufs=`vac_with_bufs`insert_result $test_no $delratio 1update_result_bufs$test_no $delratio 1 $scanned_bufs # scanned_bufs=`vac_with_bufs` # insert_result $test_no $delratio 2 # update_result_bufs $test_no $delratio 2 $scanned_bufs } function test1() {local delratio=$1 echo "test1 ratio = $delratio"psql ${dbname} -f - > /dev/null <<EOF drop table if exists t; create table t (a int, b int); insert into t (select a, (random() * 100000)::int from generate_series((select count(*) from t) + 1, $nrows) a); update t set b = b + 1 where a < (select count(*) from t) * 0.7; vacuum t; delete from t where a < (select count(*) from t) * $delratio; EOF store_result 1 $delratio } function test2() {local delratio=$1 echo "test2 ratio = $delratio"psql ${dbname} -f - > /dev/null <<EOF drop table if exists t; create table t (a int, b text); insert into t (select a, 'abcdefg' from generate_series((select count(*) from t) + 1, $nrows) a); update t set b = repeat('abcdefghij', 250) where a < (select count(*) from t) * 0.7; vacuum t; delete from t where a < (select count(*) from t) * $delratio; EOFstore_result 2 $delratio } psql ${dbname} -f - > /dev/null <<EOF drop table if exists result; create table result ("#" int, "del%" float, "##" int, pages int, n_live_tup int, "tups est" int, "tups real" int, "est/real"numeric(10, 3), bufs int default 0); EOF nrows=1000000 test1 0.1 test1 0.2 test1 0.3 test1 0.4 test1 0.5 test1 0.6 test1 0.7 test1 0.8 test1 0.9 test1 0.95 test1 0.99 nrows=100000 test2 0.1 test2 0.2 test2 0.3 test2 0.4 test2 0.5 test2 0.6 test2 0.7 test2 0.8 test2 0.9 test2 0.95 test2 0.99 psql ${dbname} -c 'select * from result order by "#", "del%", "##"' ==== BEFORE patch. # = 1 is fixed sized rows, # = 2 is variable sized rows # | del% | pages | n_live_tup | tups est | tups real | est/real | bufs ---+------+-------+------------+----------+-----------+----------+------1 | 0.1 | 7523 | 900001 | 941326 | 900001| 1.046 | 4441 | 0.2 | 7523 | 800001 | 882465 | 800001 | 1.103 | 8861 | 0.3 | 7523 | 700001| 823697 | 700001 | 1.177 | 13291 | 0.4 | 7523 | 600001 | 764836 | 600001 | 1.275 | 17711 | 0.5 | 7523 | 500001 | 706068 | 500001 | 1.412 | 22141 | 0.6 | 7523 | 400001 | 647206 | 400001 | 1.618 | 26561 | 0.7 | 4425 | 300001 | 588239 | 300001 | 1.961 | 30991 | 0.8 | 4425 | 200001 | 529394| 200001 | 2.647 | 35421 | 0.9 | 4425 | 100001 | 470626 | 100001 | 4.706 | 39851 | 0.95 | 4425| 50001 | 441196 | 50001 | 8.824 | 42061 | 0.99 | 4425 | 417670 | 417670 | 10001 | 41.763| 43832 | 0.1 | 1263 | 100000 | 91902 | 90001 | 1.021 | 1042 | 0.2 | 1263 | 100000 | 83737| 80001 | 1.047 | 2072 | 0.3 | 1263 | 100000 | 75573 | 70001 | 1.080 | 3102 | 0.4 | 1263| 100000 | 67409 | 60001 | 1.123 | 4132 | 0.5 | 1263 | 100000 | 59245 | 50001 | 1.185| 5162 | 0.6 | 1263 | 100000 | 51099 | 40001 | 1.277 | 6202 | 0.7 | 541 | 100000 | 42855| 30001 | 1.428 | 7232 | 0.8 | 541 | 38607 | 38607 | 20001 | 1.930 | 7782 | 0.9 | 541| 100000 | 34321 | 10001 | 3.432 | 8322 | 0.95 | 541 | 100000 | 34930 | 5001 | 6.985 | 8852 | 0.99 | 541 | 30930 | 30930 | 1001 | 30.899 | 885 ==== AFTER the patch# | del% | pages | n_live_tup | tups est | tups real | est/real | bufs ---+------+-------+------------+----------+-----------+----------+------1 | 0.1 | 7523 | 900001 | 941326 | 900001| 1.046 | 4441 | 0.2 | 7523 | 800001 | 803449 | 800001 | 1.004 | 8861 | 0.3 | 7523 | 700001| 703835 | 700001 | 1.005 | 13291 | 0.4 | 7523 | 600001 | 604220 | 600001 | 1.007 | 17711 | 0.5 | 7523 | 500001 | 504606 | 500001 | 1.009 | 22141 | 0.6 | 7523 | 400001 | 404992 | 400001 | 1.012 | 26561 | 0.7 | 4425 | 300001 | 305340 | 300001 | 1.018 | 30991 | 0.8 | 4425 | 205758 | 205758| 200001 | 1.029 | 35421 | 0.9 | 4425 | 106169 | 106169 | 100001 | 1.062 | 39851 | 0.95 | 4425| 56373 | 56373 | 50001 | 1.127 | 42061 | 0.99 | 4425 | 10001 | 16535 | 10001 | 1.653| 43832 | 0.1 | 1263 | 100000 | 91902 | 90001 | 1.021 | 1042 | 0.2 | 1263 | 100000 | 67589| 80001 | 0.845 | 2072 | 0.3 | 1263 | 100000 | 57522 | 70001 | 0.822 | 3102 | 0.4 | 1263| 60001 | 47489 | 60001 | 0.791 | 4132 | 0.5 | 1263 | 100000 | 37469 | 50001 | 0.749| 5162 | 0.6 | 1263 | 40001 | 27455 | 40001 | 0.686 | 6202 | 0.7 | 541 | 100000 | 17432| 30001 | 0.581 | 7232 | 0.8 | 541 | 20001 | 12894 | 20001 | 0.645 | 7782 | 0.9 | 541| 7570 | 7570 | 10001 | 0.757 | 8322 | 0.95 | 541 | 100000 | 6605 | 5001 | 1.321 | 8852 | 0.99 | 541 | 100000 | 2598 | 1001 | 2.595 | 885 (44 rows)
pgsql-hackers by date: