Re: Deleting older versions in unique indexes to avoid page splits - Mailing list pgsql-hackers
From | Victor Yegorov |
---|---|
Subject | Re: Deleting older versions in unique indexes to avoid page splits |
Date | |
Msg-id | CAGnEbohqBDNX5kncK+X8BaOfLcFKMqUHiSNUOD0ofxCQzsOjdg@mail.gmail.com Whole thread Raw |
In response to | Re: Deleting older versions in unique indexes to avoid page splits (Peter Geoghegan <pg@bowt.ie>) |
Responses |
Re: Deleting older versions in unique indexes to avoid page splits
Re: Deleting older versions in unique indexes to avoid page splits |
List | pgsql-hackers |
чт, 31 дек. 2020 г. в 03:55, Peter Geoghegan <pg@bowt.ie>:
Attached is v12, which fixed bitrot against the master branch. This
version has significant comment and documentation revisions. It is
functionally equivalent to v11, though.
I intend to commit the patch in the next couple of weeks. While it
certainly would be nice to get a more thorough review, I don't feel
that it is strictly necessary. The patch provides very significant
benefits with certain workloads that have traditionally been
considered an Achilles' heel for Postgres. Even zheap doesn't provide
a solution to these problems. The only thing that I can think of that
might reasonably be considered in competition with this design is
WARM, which hasn't been under active development since 2017 (I assume
that it has been abandoned by those involved).
I've looked through the v12 patch.
I like the new outline:
- _bt_delete_or_dedup_one_page() is the main entry for the new code
- first we try _bt_simpledel_pass() does improved cleanup of LP_DEAD entries
- then (if necessary) _bt_bottomupdel_pass() for bottomup deletion
- finally, we perform _bt_dedup_pass() to deduplication
- finally, we perform _bt_dedup_pass() to deduplication
We split the leaf page only if all the actions above failed to provide enough space.
Some comments on the code.
v12-0001
--------
1. For the following comment
+ * Only do this for key columns. A change to a non-key column within an
+ * INCLUDE index should not be considered because that's just payload to
+ * the index (they're not unlike table TIDs to the index AM).
The last part of it (in the parenthesis) is difficult to grasp due to
the double negation (not unlike). I think it's better to rephrase it.
2. After reading the patch, I also think, that fact, that index_unchanged_by_update()
and index_unchanged_by_update_var_walker() return different bool states
(i.e. when the latter returns true, the first one returns false) is a bit misleading.
Although logic as it is looks fine, maybe index_unchanged_by_update_var_walker()
can be renamed to avoid this confusion, to smth like index_expression_changed_walker() ?
v12-0001
--------
1. For the following comment
+ * Only do this for key columns. A change to a non-key column within an
+ * INCLUDE index should not be considered because that's just payload to
+ * the index (they're not unlike table TIDs to the index AM).
The last part of it (in the parenthesis) is difficult to grasp due to
the double negation (not unlike). I think it's better to rephrase it.
2. After reading the patch, I also think, that fact, that index_unchanged_by_update()
and index_unchanged_by_update_var_walker() return different bool states
(i.e. when the latter returns true, the first one returns false) is a bit misleading.
Although logic as it is looks fine, maybe index_unchanged_by_update_var_walker()
can be renamed to avoid this confusion, to smth like index_expression_changed_walker() ?
v12-0002
--------
1. Thanks for the comments, they're well made and do help to read the code.
2. I'm not sure the bottomup_delete_items parameter is very helpful. In order to disable
bottom-up deletion, DBA needs somehow to measure it's impact on a particular index.
Currently I do not see how to achieve this. Not sure if this is overly important, though, as
--------
1. Thanks for the comments, they're well made and do help to read the code.
2. I'm not sure the bottomup_delete_items parameter is very helpful. In order to disable
bottom-up deletion, DBA needs somehow to measure it's impact on a particular index.
Currently I do not see how to achieve this. Not sure if this is overly important, though, as
you have a similar parameter for the deduplication.
3. It feels like indexUnchanged is better to make indexChanged and negate its usage in the code.
3. It feels like indexUnchanged is better to make indexChanged and negate its usage in the code.
As !indexChanged reads more natural than !indexUnchanged, at least to me.
This is all I have. I agree, that this code is pretty close to being committed.
Now for the tests.
This is all I have. I agree, that this code is pretty close to being committed.
Now for the tests.
First, I run a 2-hour long case with the same setup as I used in my e-mail from 15 of November.
I found no difference between patch and master whatsoever. Which makes me think, that current
I found no difference between patch and master whatsoever. Which makes me think, that current
master is quite good at keeping better bloat control (not sure if this is an effect of
4228817449 commit or deduplication).
I created another setup (see attached testcases). Basically, I emulated queue operations(INSERT at the end and DELETE
I created another setup (see attached testcases). Basically, I emulated queue operations(INSERT at the end and DELETE
Victor Yegorov
pgsql-hackers by date: