Re: Incremental sort for access method with ordered scan support (amcanorderbyop) - Mailing list pgsql-hackers

From Jonathan S. Katz
Subject Re: Incremental sort for access method with ordered scan support (amcanorderbyop)
Date
Msg-id 3122b1ba-b5e5-ad92-0894-b7a8c639f1ff@postgresql.org
Whole thread Raw
In response to Re: Incremental sort for access method with ordered scan support (amcanorderbyop)  (Richard Guo <guofenglinux@gmail.com>)
List pgsql-hackers
On 7/5/23 2:15 AM, Richard Guo wrote:
> 
> On Tue, Jul 4, 2023 at 7:15 PM David Rowley <dgrowleyml@gmail.com 
> <mailto:dgrowleyml@gmail.com>> wrote:
> 
>     On Tue, 4 Jul 2023 at 20:12, Richard Guo <guofenglinux@gmail.com
>     <mailto:guofenglinux@gmail.com>> wrote:
>      > The v4 patch looks good to me (maybe some cosmetic tweaks are still
>      > needed for the comments).  I think it's now 'Ready for Committer'.
> 
>     I agree. I went and hit the comments with a large hammer and while
>     there also adjusted the regression tests. I didn't think having "t" as
>     a table name was a good idea as it seems like a name with a high risk
>     of conflicting with a concurrently running test. Also, there didn't
>     seem to be much need to insert data into that table as the tests
>     didn't query any of it.
> 
>     The only other small tweak I made was to not call list_copy_head()
>     when the list does not need to be shortened. It's likely not that
>     important, but if the majority of cases are not partial matches, then
>     we'd otherwise be needlessly making copies of the list.
> 
>     I pushed the adjusted patch.
> 
> 
> The adjustments improve the patch a lot.  Thanks for adjusting and
> pushing the patch.

Thanks for working on this! While it allows the planner to consider 
choosing an incremental sort for indexes that implement 
"amcanorderbyop", it also has a positive side-effect that the planner 
will also consider choosing a plan for spawning parallel workers!

Because of that, I'd like to open the discussion that we consider 
backpatching this. Currently, extensions that implement index access 
methods (e.g. pgvector[1]) that are built primarily around 
"amcanorderbyop" are unable to get the planner to consider choosing a 
parallel scan, i.e. at this point in "create_order_paths"[2]:

/*
* If cheapest partial path doesn't need a sort, this is redundant
* with what's already been tried.
*/
if (!pathkeys_contained_in(root->sort_pathkeys,
                            cheapest_partial_path->pathkeys))

However, 625d5b3c does unlock this path for these types of indexes to 
allow for a parallel index scan to be chosen, which would allow 
extensions that implement a "amcanorderbyop" scan to use it. I would 
argue that this is a bug, given we offer the ability for index access 
methods to implement parallel index scans.

That said, I do think they may still need to be one planner tweak to 
properly support parallel index scan in this case, as I have yet to see 
costs generated where the parallel index scan is cheaper. However, I 
have not yet narrowed what/where that is.

Thanks,

Jonathan

[1] https://github.com/pgvector/pgvector
[2] 
https://git.postgresql.org/gitweb/?p=postgresql.git;a=blob;f=src/backend/optimizer/plan/planner.c;#l5188

Attachment

pgsql-hackers by date:

Previous
From: Amit Kapila
Date:
Subject: Re: doc: clarify the limitation for logical replication when REPILICA IDENTITY is FULL
Next
From: Dave Cramer
Date:
Subject: Re: CommandStatus from insert returning when using a portal.