While working on the regression tests added in a14a58329, I noticed that DISTINCT does not make use of Incremental Sort. It'll only ever do full sorts on the cheapest input path or make use of a path that's already got the required pathkeys. Also, I see that create_final_distinct_paths() is a little quirky and if the cheapest input path happens to be sorted, it'll add_path() the same path twice, which seems like a bit of a waste of effort. That could happen if say enable_seqscan is off or if a Merge Join is the cheapest join method.
Additionally, the parallel DISTINCT code looks like it should also get the same treatment. I see that I'd coded this to only add a unique path atop of a presorted path and it never considers sorting the cheapest partial path. I've adjusted that in the attached and also made it consider incremental sorting any path with presorted keys.
Please see the attached patch.
+1 for the changes. A minor comment is that previously on HEAD for SELECT DISTINCT case, if we have to do an explicit full sort atop the cheapest path, we try to make sure to always use the more rigorous ordering.
/* For explicit-sort case, always use the more rigorous clause */ if (list_length(root->distinct_pathkeys) < list_length(root->sort_pathkeys)) { needed_pathkeys = root->sort_pathkeys; /* Assert checks that parser didn't mess up... */ Assert(pathkeys_contained_in(root->distinct_pathkeys, needed_pathkeys)); } else needed_pathkeys = root->distinct_pathkeys;
I'm not sure if this is necessary, as AFAIU the parser should have ensured that the sortClause is always a prefix of distinctClause.
In the patch this code has been removed. I think we should also remove the related comments in create_final_distinct_paths.
* When we have DISTINCT ON, we must sort by the more rigorous of * DISTINCT and ORDER BY, else it won't have the desired behavior. - * Also, if we do have to do an explicit sort, we might as well use - * the more rigorous ordering to avoid a second sort later. (Note - * that the parser will have ensured that one clause is a prefix of - * the other.)
Also, the comment just above this one is outdated too.
* First, if we have any adequately-presorted paths, just stick a * Unique node on those. Then consider doing an explicit sort of the * cheapest input path and Unique'ing that.
The two-step workflow is what is the case on HEAD but not any more in the patch. And I think we should mention incremental sort on any paths with presorted keys.