Thanks for having a look at this.
On Tue, 10 Jan 2023 at 02:28, Richard Guo <guofenglinux@gmail.com> wrote:
> +1 for the changes. A minor comment is that previously on HEAD for
> SELECT DISTINCT case, if we have to do an explicit full sort atop the
> cheapest path, we try to make sure to always use the more rigorous
> ordering.
I'm not quite sure I follow what's changed here. As far as I see it
the code still does what it did and uses the more rigorous sort.
postgres=# explain (costs off) select distinct on (relname) * from
pg_Class order by relname, oid;
QUERY PLAN
----------------------------------
Unique
-> Sort
Sort Key: relname, oid
-> Seq Scan on pg_class
If it didn't, then there'd have been a Sort atop of the Unique to
ORDER BY relname,oid in the above.
Maybe you were looking at create_partial_distinct_paths()? We don't do
anything there for DISTINCT ON, although perhaps we could. Just not
for this patch.
>
> /* For explicit-sort case, always use the more rigorous clause */
> if (list_length(root->distinct_pathkeys) <
> list_length(root->sort_pathkeys))
> {
> needed_pathkeys = root->sort_pathkeys;
> /* Assert checks that parser didn't mess up... */
> Assert(pathkeys_contained_in(root->distinct_pathkeys,
> needed_pathkeys));
> }
> else
> needed_pathkeys = root->distinct_pathkeys;
>
> I'm not sure if this is necessary, as AFAIU the parser should have
> ensured that the sortClause is always a prefix of distinctClause.
I think that's true for standard DISTINCT, but it's not for DISTINCT ON.
> In the patch this code has been removed. I think we should also remove
> the related comments in create_final_distinct_paths.
>
> * When we have DISTINCT ON, we must sort by the more rigorous of
> * DISTINCT and ORDER BY, else it won't have the desired behavior.
> - * Also, if we do have to do an explicit sort, we might as well use
> - * the more rigorous ordering to avoid a second sort later. (Note
> - * that the parser will have ensured that one clause is a prefix of
> - * the other.)
I'm not quite following what you think has changed here.
> Also, the comment just above this one is outdated too.
>
> * First, if we have any adequately-presorted paths, just stick a
> * Unique node on those. Then consider doing an explicit sort of the
> * cheapest input path and Unique'ing that.
>
> The two-step workflow is what is the case on HEAD but not any more in
> the patch. And I think we should mention incremental sort on any paths
> with presorted keys.
Yeah, I agree, that comment should mention incremental sort.
I've attached an updated patch
David