Re: [PATCH] Teach planner to further optimize sort in distinct - Mailing list pgsql-hackers
From | David Rowley |
---|---|
Subject | Re: [PATCH] Teach planner to further optimize sort in distinct |
Date | |
Msg-id | CAApHDvr_tfwbkcmiPJtM+SNz81KcP0efJKRmu0hEEcKS0BQsXw@mail.gmail.com Whole thread Raw |
In response to | Re: [PATCH] Teach planner to further optimize sort in distinct (Ankit Kumar Pandey <itsankitkp@gmail.com>) |
Responses |
Re: [PATCH] Teach planner to further optimize sort in distinct
|
List | pgsql-hackers |
On Fri, 20 Jan 2023 at 08:26, Ankit Kumar Pandey <itsankitkp@gmail.com> wrote: > > On 19/01/23 18:49, David Rowley wrote: > > You can just switch to using that function in > > create_final_distinct_paths(). You'll need to consider if the query is > > a DISTINCT ON query and not try the unordered version of the function > > in that case. > > Tried this, it worked as expected. Tests are green as well. Looking at the patch, you've not added any additional tests. If the existing tests are all passing then that just tells me that either the code is not functioning as intended or we have no tests that look at the EXPLAIN output which can make use of this optimization. If the former is true, then the patch needs to be fixed. If it's the latter then you need to write new tests. > > It's possible that truncate_useless_pathkeys() needs to be modified to > > check if the pathkeys might be useful for DISTINCT > > We have pathkeys_useful_for_merging and pathkeys_useful_for_ordering. > > Can we not have pathkeys_useful_for_distinct? I don't know all the repercussions. If you look at add_path() then you'll see we do a pathkey comparison when the costs are not fuzzily different from an existing path so that we try to keep a path with the best pathkeys. If we start keeping paths around with other weird pathkeys that are not along the lines of the query_pathkeys requires, then add_path might start throwing away paths that are actually good for something. It seems probable that could cause some regressions. > I have attached patch with pathkeys_count_contained_in_unordered > and corresponding changes in create_final_distinct_paths for reference. Does this patch actually work? I tried: create table ab (a int, b int); insert into ab select a,b from generate_Series(1,1000) a,generate_series(1,1000) b; analyze ab; create index on ab(a); set enable_hashagg=0; explain select distinct b,a from ab where a < 10; QUERY PLAN ------------------------------------------------------------------------------------ Unique (cost=729.70..789.67 rows=7714 width=8) -> Sort (cost=729.70..749.69 rows=7996 width=8) Sort Key: b, a -> Index Scan using ab_a_idx on ab (cost=0.42..211.36 rows=7996 width=8) Index Cond: (a < 10) (5 rows) I'd have expected an incremental sort here. I don't see that you're adjusting IndexPath's pathkeys anywhere. The nested loop in pathkeys_count_contained_in_unordered() seems to be inside out. The reordered_pathkeys must have the common pathkeys in the order they appear in keys2. In your patch, they'll be ordered according to the keys1 list, which is wrong. Testing would tell you this, so all the more reason to make the patch work and write some queries to ensure it does actually work, then some tests to ensure that remains in a working state. Feel free to take the proper time to write a working patch which contains new tests to ensure it's functioning as intended. It's disheartening to review patches that don't seem to work. If it wasn't meant to work, then you didn't make that clear. I'll likely not be able to do any further reviews until the March commitfest, so it might be better to only post if you're stuck. Please don't rush out the next patch. Take your time and study the code and see if you can build up your own picture for what the repercussions might be of IndexPaths having additional pathkeys when they were previously empty. If you're uncertain of aspects of the patch you've written feel free to leave XXX type comments to indicate this. That way the reviewer will know you might need more guidance on that and you'll not forget yourself when you come back and look again after a few weeks. David
pgsql-hackers by date: