Re: PG 12 draft release notes - Mailing list pgsql-hackers
From | Peter Geoghegan |
---|---|
Subject | Re: PG 12 draft release notes |
Date | |
Msg-id | CAH2-WznBzM3H4B_iB5E8dAP7Br7tDRM3Bhf4hPxQ2oZfLcmg+A@mail.gmail.com Whole thread Raw |
In response to | Re: PG 12 draft release notes (Bruce Momjian <bruce@momjian.us>) |
Responses |
Re: PG 12 draft release notes
|
List | pgsql-hackers |
On Wed, Jun 12, 2019 at 5:22 PM Bruce Momjian <bruce@momjian.us> wrote: > I had become so confused by this item that I needed a few weeks to > settle on what was actually going on. I put a lot of time into my pgCon talk, especially on the diagrams. Seems like that paid off. Even Heikki was confused by my explanations at one point. I should go add a similar diagram to our documentation, under "Chapter 63. B-Tree Indexes", because diagrams are the only sensible way to explain the concepts. > I was wrong. I was thinking of this commit: > > commit d2086b08b0 > Author: Alexander Korotkov <akorotkov@postgresql.org> > Date: Sat Jul 28 00:31:40 2018 +0300 > > Reduce path length for locking leaf B-tree pages during insertion > > If you had to cut one thing from this list, then I would suggest that > > it be this item. It's nice, but it's also very obvious, which makes it > > hard to explain. > Right. The commit mentioned a 4.5x speedup in a rare benchmark, so I > added it lower on the list. My remark about cutting an item referred to a lesser item that I worked on (the 'Add nbtree high key "continuescan" optimization' commit), not Alexander independent B-Tree work. I think that Alexander's optimization is also quite effective. Though FWIW the 4.5x improvement concerned a case involving lots of duplicates...cases with a lot of duplicates will be far far better in Postgres 12. (I never tested my patch without Alexander's commit, since it went in early in the v12 cycle.) > Yes, locality. "Locality" is one of my favorite words. > Attached is an updated patch. I might have missed something, but I > think it might be close. This looks great. I do have a few things: * I would put "Improve performance and space utilization of btree indexes with many duplicates" first (before "Allow multi-column btree indexes to be smaller"). I think that this is far more common than we tend to assume, and is also where the biggest benefits are. * The wording of the "many duplicates" item itself is almost perfect, though the "...and inefficiency when VACUUM needed to find a row for removal" part seems a bit off -- this is really about the effectiveness of VACUUM, not the speed at which the VACUUM completes (it's a bit faster, but that's not that important). Perhaps that part should read: "...and often failed to efficiently recycle space made available by VACUUM". Something like that. * The "Allow multi-column btree indexes to be smaller" item is about both suffix truncation, and about the "Split after new tuple" optimization. I think that that makes it more complicated than it needs to be. While the improvements that we saw with TPC-C on account of the "Split after new tuple" optimization were nice, I doubt that users will be looking out for it. I would be okay if you dropped any mention of the "Split after new tuple" optimization, in the interest of making the description more useful to users. We can just lose that. * Once you simplify the item by making it all about suffix truncation, it would make sense to change the single line summary to "Reduce the number of branch blocks needed for multi-column indexes". Then go on to talk about how we now only store those columns that are necessary to guide index scans in tuples stored in branch pages (we tend to call branch pages internal pages, but branch pages seems friendlier to me). Note that the user docs of other database systems reference these details, even in their introductory material on how B-Tree indexes work. The term "suffix truncation" isn't something users have heard of, and we shouldn't use it here, but the *idea* of suffix truncation is very well established. As I mentioned, it matters for things like covering indexes (indexes that are designed to be used by index-only scans, which are not necessarily INCLUDE indexes). Thanks! -- Peter Geoghegan
pgsql-hackers by date: