Re: Deleting older versions in unique indexes to avoid page splits - Mailing list pgsql-hackers

From Peter Geoghegan
Subject Re: Deleting older versions in unique indexes to avoid page splits
Date
Msg-id CAH2-WzmQcpHEATvgR4bb0aO8bWDggPJVabw488arYe9L+72V5A@mail.gmail.com
Whole thread Raw
In response to Re: Deleting older versions in unique indexes to avoid page splits  (Victor Yegorov <vyegorov@gmail.com>)
Responses Re: Deleting older versions in unique indexes to avoid page splits  (Peter Geoghegan <pg@bowt.ie>)
List pgsql-hackers
On Tue, Nov 17, 2020 at 7:24 AM Victor Yegorov <vyegorov@gmail.com> wrote:
> I've looked through the code and it looks very good from my end:
> - plenty comments, good description of what's going on
> - I found no loose ends in terms of AM integration
> - magic constants replaced with defines
> Code looks good. Still, it'd be good if somebody with more experience could look into this patch.

Great, thank you.

> Question: why in the comments you're using double spaces after dots?
> Is this a convention of the project?

Not really. It's based on my habit of trying to be as consistent as
possible with existing code.

There seems to be a weak consensus among English speakers on this
question, which is: the two space convention is antiquated, and only
ever made sense in the era of mechanical typewriters. I don't really
care either way, and I doubt that any other committer pays much
attention to these things. You may have noticed that I use only one
space in my e-mails.

Actually, I probably shouldn't care about it myself. It's just what I
decided to do at some point. I find it useful to decide that this or
that practice is now a best practice, and then stick to it without
thinking about it very much (this frees up space in my head to think
about more important things). But this particular habit of mine around
spaces is definitely not something I'd insist on from other
contributors. It's just that: a habit.

> I am thinking of two more scenarios that require testing:
> - queue in the table, with a high rate of INSERTs+DELETEs and a long transaction.

I see your point. This is going to be hard to make work outside of
unique indexes, though. Unique indexes are already not dependent on
the executor hint -- they can just use the "uniquedup" hint. The code
for unique indexes is prepared to notice duplicates in
_bt_check_unique() in passing, and apply the optimization for that
reason.

Maybe there is some argument to forgetting about the hint entirely,
and always assuming that we should try to find tuples to delete at the
point that a page is about to be split. I think that that argument is
a lot harder to make, though. And it can be revisited in the future.
It would be nice to do better with INSERTs+DELETEs, but that's surely
not the big problem for us right now.

I realize that this unique indexes/_bt_check_unique() thing is not
even really a partial fix to the problem you describe. The indexes
that have real problems with such an INSERTs+DELETEs workload will
naturally not be unique indexes -- _bt_check_unique() already does a
fairly good job of controlling bloat without bottom-up deletion.

> - upgraded cluster with !heapkeyspace indexes.

I do have a patch that makes that easy to test, that I used for the
Postgres 13 deduplication work -- I can rebase it and post it if you
like. You will be able to apply the patch, and run the regression
tests with a !heapkeyspace index. This works with only one or two
tweaks to the tests (IIRC the amcheck tests need to be tweaked in one
place for this to work). I don't anticipate that !heapkeyspace indexes
will be a problem, because they won't use any of the new stuff anyway,
and because nothing about the on-disk format is changed by bottom-up
index deletion.

-- 
Peter Geoghegan



pgsql-hackers by date:

Previous
From: Chapman Flack
Date:
Subject: Re: pl/pgsql feature request: shorthand for argument and local variable references
Next
From: Robert Haas
Date:
Subject: Re: Protect syscache from bloating with negative cache entries