Re: New IndexAM API controlling index vacuum strategies - Mailing list pgsql-hackers
From | Peter Geoghegan |
---|---|
Subject | Re: New IndexAM API controlling index vacuum strategies |
Date | |
Msg-id | CAH2-WzmmNDhY2t8QVeww=-PFg_1ThW=04+8xFtETv7kpoULVHw@mail.gmail.com Whole thread Raw |
In response to | Re: New IndexAM API controlling index vacuum strategies (Peter Geoghegan <pg@bowt.ie>) |
Responses |
Re: New IndexAM API controlling index vacuum strategies
|
List | pgsql-hackers |
On Tue, Jan 19, 2021 at 2:57 PM Peter Geoghegan <pg@bowt.ie> wrote: > Looks good. I'll give this version a review now. I will do a lot more > soon. I need to come up with a good benchmark for this, that I can > return to again and again during review as needed. I performed another benchmark, similar to the last one but with the latest version (v2), and over a much longer period. Attached is a summary of the whole benchmark, and log_autovacuum output from the logs of both the master branch and the patch. This was pgbench scale 2000, 4 indexes on pgbench_accounts, and a transaction with one update and two selects. Each run was 4 hours, and we alternate between patch and master for each run, and alternate between 16 and 32 clients. There were 8 4 hour runs in total, meaning the entire set of runs took 8 * 4 hours = 32 hours (not including initial load time and a few other small things like that). I used a 10k TPS rate limit, so TPS isn't interesting here. Latency is interesting -- we see a nice improvement in latency (i.e. a reduction) for the patch (see all.summary.out). The benefits of the patch are clearly visible when I drill down and look at the details. Each pgbench_accounts autovacuum VACUUM operation can finish faster with the patch because they can often skip at least some indexes (usually the PK, sometimes 3 out of 4 indexes total). But it's more subtle than some might assume. We're skipping indexes that VACUUM actually would have deleted *some* index tuples from, which is very good. Bottom-up index deletion is usually lazy, and only occasionally very eager, so you still have plenty of "floating garbage" index tuples in most pages. And now we see VACUUM behave a little more like bottom-up index deletion -- it is lazy when that is appropriate (with indexes that really only have floating garbage that is spread diffusely throughout the index structure), and eager when that is appropriate (with indexes that get much more garbage). The benefit is not really that we're avoiding doing I/O for index vacuuming (though that is one of the smaller benefits here). The real benefit is that VACUUM is not dirtying pages, since it skips indexes when it would be "premature" to vacuum them from an efficiency point of view. This is important because we know that Postgres throughput is very often limited by page cleaning. Also, the "economics" of this new behavior make perfect sense -- obviously it's more efficient to delay garbage cleanup until the point when the same page will be modified by a backend anyway -- in the case of this benchmark via bottom-up index deletion (which deletes all garbage tuples in the leaf page at the point that it runs for a subset of pointed-to heap pages -- it's not using an oldestXmin cutoff from 30 minutes ago). So whenever we dirty a page, we now get more value per additional-page-dirtied. I believe that controlling the number of pages dirtied by VACUUM is usually much more important than reducing the amount of read I/O from VACUUM, for reasons I go into on the recent "vacuum_cost_page_miss default value and modern hardware" thread. As a further consequence of all this, VACUUM can go faster safely and sustainably (since the cost limit is not affected so much by vacuum_cost_page_miss), which has its own benefits (e.g. oldestXmin cutoff doesn't get so old towards the end). Another closely related huge improvement that we see here is that the number of FPIs generated by VACUUM can be significantly reduced. This cost is closely related to the cost of dirtying pages, but it's worth mentioning separately. You'll see some of that in the log_autovacuum log output I attached. There is an archive with much more detailed information, including dumps from most pg_stat_* views at key intervals. This has way more information than anybody is likely to want: https://drive.google.com/file/d/1OTiErELKRZmYnuJuczO2Tfcm1-cBYITd/view?usp=sharing I did notice a problem, though. I now think that the criteria for skipping an index vacuum in the third patch from the series is too conservative, and that this led to an excessive number of index vacuums with the patch. This is probably because there was a tiny number of page splits in some of the indexes that were not really supposed to grow. I believe that this is caused by ANALYZE running -- I think that it prevented bottom-up deletion from keeping a few of the hottest pages from splitting (that can take 5 or 6 seconds) at a few points over the 32 hour run. For example, the index named "tenner" grew by 9 blocks, starting out at 230,701 and ending up at 230,710 (to see this, extract the files from the archive and "diff patch.r1c16.initial_pg_relation_size.out patch.r2c32.after_pg_relation_size.out"). I now think that 0 blocks added is unnecessarily restrictive -- a small tolerance still seems like a good idea, though (let's still be somewhat conservative about it). Maybe a better criteria would be for nbtree to always proceed with index vacuuming when the index size is less than 2048 blocks (16MiB with 8KiB BLCKSZ). If an index is larger than that, then compare the last/old block count to the current block count (at the point that we decide if index vacuuming is going to go ahead) by rounding up both values to the next highest 2048 block increment. This formula is pretty arbitrary, but probably works as well as most others. It's a good iteration for the next version of the patch/further testing, at least. BTW, it would be nice if there was more instrumentation, say in the log output produced when log_autovacuum is on. That would make it easier to run these benchmarks -- I could verify my understanding of the work done for each particular av operation represented in the log. Though the default log_autovacuum log output is quite informative, it would be nice if the specifics were more obvious (maybe this could just be for the review/testing, but it might become something for users if it seems useful). -- Peter Geoghegan
Attachment
pgsql-hackers by date: