Re: Questions regarding Index AMs and natural ordering - Mailing list pgsql-hackers

From Peter Geoghegan
Subject Re: Questions regarding Index AMs and natural ordering
Date
Msg-id CAH2-WzmWn2_eS_4rWy90DRzC-NW-oponONR6PwMqy+OOuvVyFA@mail.gmail.com
Whole thread Raw
In response to Re: Questions regarding Index AMs and natural ordering  (Matthias van de Meent <boekewurm+postgres@gmail.com>)
Responses Re: Questions regarding Index AMs and natural ordering
List pgsql-hackers
On Fri, Nov 24, 2023 at 8:44 AM Matthias van de Meent
<boekewurm+postgres@gmail.com> wrote:
> Also, was that a confirmation that amcanorder is a requirement for the
> AM to return data in index order (unless amrescan's orderbys is not
> null), or just a comment on the reason for the name of 'amcanorder'
> being unclear?

It was both, I suppose.

> > Cases where the planner uses a merge join often involve an index path
> > with an "interesting sort order" that "enables" the merge join.
> > Perhaps most of the alternative plans (that were almost as cheap as
> > the merge join plan) would have had to scan the same index in the same
> > way anyway, so it ends up making sense to use a merge join. The merge
> > join can get ordered results from the index "at no extra cost". (All
> > of this is implicit, of course -- the actual reason why the planner
> > chose the merge join plan is because it worked out to be the cheapest
> > plan.)
>
> Couldn't the merge join (or scan node) use a tuple store to return to
> some earlier point in the index scan when a native version of markpos
> is not easily supported?

You can materialize any executor node, allowing it to be accessed in
random order, as required by merge joins (in many cases). But any
index AM that relies on that clearly isn't an amcanorder=true index
AM, which is what you asked about.

Whether or not you should actually care about whether your index AM
can meet the expectations that the API has for amcanorder=true index
AMs is far from clear. In the end the design has to make sense, and
integrate into the existing API in some way or other -- but the
details are likely to depend on things that nobody thought of just
yet. I don't think that it's all that useful to discuss it while
everything remains so abstract.

> It would add (potentially very significant)
> IO overhead, but it would also allow merge joins on ordered paths that
> currently don't have a natural way of marking their position.

I don't know. Maybe it's possible. It might even be practically
achievable, while delivering a compelling benefit to users. This is
the kind of thing that I don't tend to speculate about, at least not
before I have more concrete information about what is possible through
some kind of prototyping process.

> Yes, the part where btree opclasses determine a type's ordering is
> clear. But what I'm looking for is "how do I, as an index AM
> implementation, get the signal that I need to return column-ordered
> data?" If that signal is "index AM marked amcanorder == index must
> return ordered data", then that's suboptimal for the index AM writer,
> but understandable. If amcanorder does not imply always ordered
> retrieval, then I'd like to know how it is signaled to the AM. But as
> of yet I've not found a clear and conclusive answer either way.

I suppose that amcanorder=true cannot mean that, since we have the
SAOP path key thing (at least for now). But that wasn't true until bug
fix commit 807a40c5, so prior to that point I guess it was tacitly the
case that B-Tree scans always returned ordered results.

Here we are trying to divine the intentions of an "abstract" API by
discussing highly obscure bugs that were either bugs in nbtree, or
bugs in what we once expected of nbtree. Seems more than a bit silly
to me.

I'm not suggesting that the idea of an abstract API doesn't need to be
taken seriously at all -- far from it. Just that "case law precedent"
can play an important role in how it is interpreted. The fact that
some things remain ambiguous isn't necessarily a problem that needs to
be solved. If there is a real problem, then IMV it should always start
with a concrete complaint about how the API doesn't support certain
requirements. In other words, I'm of the opinion that such a complaint
should end with the API, as opposed to starting with it.

--
Peter Geoghegan



pgsql-hackers by date:

Previous
From: Bruce Momjian
Date:
Subject: Re: [HACKERS] pg_upgrade vs vacuum_cost_delay
Next
From: Tom Lane
Date:
Subject: Re: Questions regarding Index AMs and natural ordering