Re: Deleting older versions in unique indexes to avoid page splits - Mailing list pgsql-hackers
From | Peter Geoghegan |
---|---|
Subject | Re: Deleting older versions in unique indexes to avoid page splits |
Date | |
Msg-id | CAH2-WznjQQwSPSxBiZ6wQyBiKqFmfdjOdeqp28otqw551T7jcQ@mail.gmail.com Whole thread Raw |
In response to | Re: Deleting older versions in unique indexes to avoid page splits (Peter Geoghegan <pg@bowt.ie>) |
Responses |
Re: Deleting older versions in unique indexes to avoid page splits
Re: Deleting older versions in unique indexes to avoid page splits |
List | pgsql-hackers |
On Wed, Oct 14, 2020 at 7:40 AM Peter Geoghegan <pg@bowt.ie> wrote: > Right. The trick is to pay only a fixed low cost (maybe as low as one > heap page access) when we start out, and ratchet it up only if the > first heap page access looks promising. Just as an example of how the patch can help, consider the following pgbench variant script: \set aid1 random_gaussian(1, 100000 * :scale, 2.0) \set aid2 random_gaussian(1, 100000 * :scale, 2.5) \set aid3 random_gaussian(1, 100000 * :scale, 2.2) \set bid random(1, 1 * :scale) \set tid random(1, 10 * :scale) \set delta random(-5000, 5000) BEGIN; UPDATE pgbench_accounts SET abalance = abalance + :delta WHERE aid = :aid1; SELECT abalance FROM pgbench_accounts WHERE aid = :aid2; SELECT abalance FROM pgbench_accounts WHERE aid = :aid3; END; (These details you see here are a bit arbitrary; don't worry about the specifics too much.) Before running the script with pgbench, I initialized pgbench to scale 1500, and made some changes to the indexing (in order to actually test the patch). There was no standard pgbench_accounts PK. Instead, I created a unique index that had an include column, which is enough to make every update a non-HOT update. I also added two more redundant non-unique indexes to create more overhead from non-HOT updates. It looked like this: create unique index aid_pkey_include_abalance on pgbench_accounts (aid) include (abalance); create index one on pgbench_accounts (aid); create index two on pgbench_accounts (aid); So 3 indexes on the accounts table. I ran the script for two hours and 16 clients with the patch, then for another two hours with master. After that time, all 3 indexes were exactly the same size with the patch, but had almost doubled in size on master: aid_pkey_include_abalance: 784,264 pages (or ~5.983 GiB) one: 769,545 pages (or ~5.871 GiB) two: 770,295 pages (or ~5.876 GiB) (With the patch, all three indexes were 100% pristine -- they remained precisely 411,289 pages in size by the end, which is ~3.137 GiB.) Note that traditional deduplication is used by the indexes I've called "one" and "two" here, but not the include index called "aid_pkey_include_abalance". But it makes little difference, for reasons that will be obvious if you think about what this looks like at the leaf page level. Cases that Postgres 13 deduplication does badly with are often the ones that this new mechanism does well with. Deduplication by deleting and by merging are truly complementary -- I haven't just structured the code that way because it was convenient to use dedup infrastructure just to get the dups at the start. (Yes, it *was* convenient, but there clearly are cases where each mechanism competes initially, before nbtree converges on the best strategy at the local level. So FWIW this patch is a natural and logical extension of the deduplication work in my mind.) The TPS/throughput is about what you'd expect for the two hour run: 18,988.762398 TPS for the patch 11,123.551707 TPS for the master branch. This is a ~1.7x improvement, but I can get more than 3x by changing the details at the start -- just add more indexes. I don't think that the precise throughput difference you see here matters. The important point is that we've more or less fixed a pathological set of behaviors that have poorly understood cascading effects. Full disclosure: I rate limited pgbench to 20k for this run, which probably wasn't significant because neither patch nor master hit that limit for long. Big latency improvements for that same run, too: Patch: statement latencies in milliseconds: 0.001 \set aid1 random_gaussian(1, 100000 * :scale, 2.0) 0.000 \set aid2 random_gaussian(1, 100000 * :scale, 2.5) 0.000 \set aid3 random_gaussian(1, 100000 * :scale, 2.2) 0.000 \set bid random(1, 1 * :scale) 0.000 \set tid random(1, 10 * :scale) 0.000 \set delta random(-5000, 5000) 0.057 BEGIN; 0.294 UPDATE pgbench_accounts SET abalance = abalance + :delta WHERE aid = :aid1; 0.204 SELECT abalance FROM pgbench_accounts WHERE aid = :aid2; 0.195 SELECT abalance FROM pgbench_accounts WHERE aid = :aid3; 0.090 END; Master: statement latencies in milliseconds: 0.002 \set aid1 random_gaussian(1, 100000 * :scale, 2.0) 0.001 \set aid2 random_gaussian(1, 100000 * :scale, 2.5) 0.001 \set aid3 random_gaussian(1, 100000 * :scale, 2.2) 0.001 \set bid random(1, 1 * :scale) 0.000 \set tid random(1, 10 * :scale) 0.001 \set delta random(-5000, 5000) 0.084 BEGIN; 0.604 UPDATE pgbench_accounts SET abalance = abalance + :delta WHERE aid = :aid1; 0.317 SELECT abalance FROM pgbench_accounts WHERE aid = :aid2; 0.311 SELECT abalance FROM pgbench_accounts WHERE aid = :aid3; 0.120 END; Note that the mechanism added by the patch naturally limits the number of versions that are in the index for each logical row, which seems much more important than the total amount of garbage tuples cleaned up. It's great that the index is half its original size, but even that is less important than the effect of more or less bounding the worst case number of heap pages accessed by point index scans. Even though this patch shows big performance improvements (as well as very small performance regressions for small indexes with skew), the patch is mostly about stability. I believe that Postgres users want greater stability and predictability in this area more than anything else. The behavior of the system as a whole that we see for the master branch here is not anywhere near linear. Page splits are of course expensive, but they also occur in distinct waves [1] and have lasting consequences. They're very often highly correlated, with clear tipping points, so you see relatively sudden slow downs in the real world. Worse still, with skew the number of hot pages that you have can double in a short period of time. This very probably happens at the worst possible time for the user, since the database was likely already organically experiencing very high index writes at the point of experiencing the first wave of splits (splits correlated both within and across indexes on the same table). And, from that point on, the number of FPIs for the same workload also doubles forever (or at least until REINDEX). [1] https://btw.informatik.uni-rostock.de/download/tagungsband/B2-2.pdf -- Peter Geoghegan
pgsql-hackers by date: