Re: Faster inserts with mostly-monotonically increasing values - Mailing list pgsql-hackers
From | Peter Geoghegan |
---|---|
Subject | Re: Faster inserts with mostly-monotonically increasing values |
Date | |
Msg-id | CAH2-Wz=5eZr0+h8wZozpHEJkoYS-T05qW_Z00JWRh3md3ofGCA@mail.gmail.com Whole thread Raw |
In response to | Re: Faster inserts with mostly-monotonically increasing values (Claudio Freire <klaussfreire@gmail.com>) |
Responses |
Re: Faster inserts with mostly-monotonically increasing values
Re: Faster inserts with mostly-monotonically increasing values |
List | pgsql-hackers |
On Mon, Mar 5, 2018 at 5:48 PM, Claudio Freire <klaussfreire@gmail.com> wrote: > From what I read, both phase 1 & 2 are served by the !P_IGNORE check. True. > For the third phase: > >> A deleted page can only be reclaimed once there is no scan or search that >> has a reference to it; until then, it must stay in place with its >> right-link undisturbed. > > That's because: > >> A deleted page cannot be reclaimed immediately, since there may be other >> processes waiting to reference it (ie, search processes that just left the >> parent, or scans moving right or left from one of the siblings). These >> processes must observe that the page is marked dead and recover >> accordingly. Searches and forward scans simply follow the right-link >> until they find a non-dead page --- this will be where the deleted page's >> key-space moved to. > > But in thise case there's no right link to follow, so it's a non-issue. > > BTree doesn't truncate AFAIK, so the cached block number can't be > nonexisting (beyond the end of the relation). It's probably fragile to > assume BTree will never truncate, so maybe it would be wise to check > for that. But if the block exists, I believe it contains a page that > can be safely interpreted by that condition. As long as there can be > no other pages with the same flags on the index while the cached page > is write-locked, that code will be safe. Introducing any case that allows us to land on a recycled page, and reason that it must at least not be the page we were looking for based on *any* criteria about the page itself seems very brittle. Yeah, it probably won't break in practice, but it's a bad design. > Yes, if the scankey is equal to the leftmost key, the first occurrence > of that key could be on the left, so the check would have to be for > strictly greater. But that's about as complex as it would get. That's pretty complicated. >> I didn't say that the patch is necessarily wrong here. Just that it >> makes me nervous. > > Any change to btree would make anyone nervous ;-) True. Anyway, this isn't the thing that I'm most concerned about right now. >> I didn't get what the point of checking the first item on the page >> (your proposal) was. > > Right now, if you've got concurrent transactions, there's absolutely > no chance this optimization would kick in. For it to happen, > insertions on the index need to happen in order, each insertion a key > greater than any other key (visible or not) on the index. > > Concurrent inserts on PKs are unlikely to arrive in order, a slight > disorder is to be expected as serial sequence numbers get generated a > significant amount of time before the insert actually takes place. > Different backends can get to the index insertion at different times, > causing unordered inserts. > > I believe PKs are a prime candidate for this optimization, and > expecting it to apply only when no concurrency is involved is severely > dumbing down the optimization. Pavan justified the patch using a benchmark that only involved a single client -- hardly typical for a patch that changes the B-Tree code. If the benefits with many clients can be shown to matter, that will make this much more interesting to me. -- Peter Geoghegan
pgsql-hackers by date: