Re: Questions regarding Index AMs and natural ordering - Mailing list pgsql-hackers
From | Peter Geoghegan |
---|---|
Subject | Re: Questions regarding Index AMs and natural ordering |
Date | |
Msg-id | CAH2-WzmWn2_eS_4rWy90DRzC-NW-oponONR6PwMqy+OOuvVyFA@mail.gmail.com Whole thread Raw |
In response to | Re: Questions regarding Index AMs and natural ordering (Matthias van de Meent <boekewurm+postgres@gmail.com>) |
Responses |
Re: Questions regarding Index AMs and natural ordering
|
List | pgsql-hackers |
On Fri, Nov 24, 2023 at 8:44 AM Matthias van de Meent <boekewurm+postgres@gmail.com> wrote: > Also, was that a confirmation that amcanorder is a requirement for the > AM to return data in index order (unless amrescan's orderbys is not > null), or just a comment on the reason for the name of 'amcanorder' > being unclear? It was both, I suppose. > > Cases where the planner uses a merge join often involve an index path > > with an "interesting sort order" that "enables" the merge join. > > Perhaps most of the alternative plans (that were almost as cheap as > > the merge join plan) would have had to scan the same index in the same > > way anyway, so it ends up making sense to use a merge join. The merge > > join can get ordered results from the index "at no extra cost". (All > > of this is implicit, of course -- the actual reason why the planner > > chose the merge join plan is because it worked out to be the cheapest > > plan.) > > Couldn't the merge join (or scan node) use a tuple store to return to > some earlier point in the index scan when a native version of markpos > is not easily supported? You can materialize any executor node, allowing it to be accessed in random order, as required by merge joins (in many cases). But any index AM that relies on that clearly isn't an amcanorder=true index AM, which is what you asked about. Whether or not you should actually care about whether your index AM can meet the expectations that the API has for amcanorder=true index AMs is far from clear. In the end the design has to make sense, and integrate into the existing API in some way or other -- but the details are likely to depend on things that nobody thought of just yet. I don't think that it's all that useful to discuss it while everything remains so abstract. > It would add (potentially very significant) > IO overhead, but it would also allow merge joins on ordered paths that > currently don't have a natural way of marking their position. I don't know. Maybe it's possible. It might even be practically achievable, while delivering a compelling benefit to users. This is the kind of thing that I don't tend to speculate about, at least not before I have more concrete information about what is possible through some kind of prototyping process. > Yes, the part where btree opclasses determine a type's ordering is > clear. But what I'm looking for is "how do I, as an index AM > implementation, get the signal that I need to return column-ordered > data?" If that signal is "index AM marked amcanorder == index must > return ordered data", then that's suboptimal for the index AM writer, > but understandable. If amcanorder does not imply always ordered > retrieval, then I'd like to know how it is signaled to the AM. But as > of yet I've not found a clear and conclusive answer either way. I suppose that amcanorder=true cannot mean that, since we have the SAOP path key thing (at least for now). But that wasn't true until bug fix commit 807a40c5, so prior to that point I guess it was tacitly the case that B-Tree scans always returned ordered results. Here we are trying to divine the intentions of an "abstract" API by discussing highly obscure bugs that were either bugs in nbtree, or bugs in what we once expected of nbtree. Seems more than a bit silly to me. I'm not suggesting that the idea of an abstract API doesn't need to be taken seriously at all -- far from it. Just that "case law precedent" can play an important role in how it is interpreted. The fact that some things remain ambiguous isn't necessarily a problem that needs to be solved. If there is a real problem, then IMV it should always start with a concrete complaint about how the API doesn't support certain requirements. In other words, I'm of the opinion that such a complaint should end with the API, as opposed to starting with it. -- Peter Geoghegan
pgsql-hackers by date: