Re: [PATCH] Erase the distinctClause if the result is unique by definition - Mailing list pgsql-hackers
From | Tom Lane |
---|---|
Subject | Re: [PATCH] Erase the distinctClause if the result is unique by definition |
Date | |
Msg-id | 2372.1583095573@sss.pgh.pa.us Whole thread Raw |
In response to | Re: [PATCH] Erase the distinctClause if the result is unique by definition (Andy Fan <zhihui.fan1213@gmail.com>) |
Responses |
Re: [PATCH] Erase the distinctClause if the result is unique by definition
|
List | pgsql-hackers |
Andy Fan <zhihui.fan1213@gmail.com> writes: > Please see if you have any comments. Thanks The cfbot isn't at all happy with this. Its linux build is complaining about a possibly-uninitialized variable, and then giving up: https://travis-ci.org/postgresql-cfbot/postgresql/builds/656722993 The Windows build isn't using -Werror, but it is crashing in at least two different spots in the regression tests: https://ci.appveyor.com/project/postgresql-cfbot/postgresql/build/1.0.81778 I've not attempted to identify the cause of that. At a high level, I'm a bit disturbed that this focuses only on DISTINCT and doesn't (appear to) have any equivalent intelligence for GROUP BY, though surely that offers much the same opportunities for optimization. It seems like it'd be worthwhile to take a couple steps back and see if we couldn't recast the logic to work for both. Some other random comments: * Don't especially like the way you broke query_is_distinct_for() into two functions, especially when you then introduced a whole lot of other code in between. That's just making reviewer's job harder to see what changed. It makes the comments a bit disjointed too, that is where you even had any. (Zero introductory comment for query_is_distinct_agg is *not* up to project coding standards. There are a lot of other undercommented places in this patch, too.) * Definitely don't like having query_distinct_through_join re-open all the relations. The data needed for this should get absorbed while plancat.c has the relations open at the beginning. (Non-nullness of columns, in particular, seems like it'll be useful for other purposes; I'm a bit surprised the planner isn't using that already.) * In general, query_distinct_through_join seems hugely messy, expensive, and not clearly correct. If it is correct, the existing comments sure aren't enough to explain what it is doing or why. * Not entirely convinced that a new GUC is needed for this, but if it is, you have to document it. * I wouldn't bother with bms_array_free(), nor with any of the other cleanup you've got at the bottom of query_distinct_through_join. The planner leaks *lots* of memory, and this function isn't going to be called so many times that it'll move the needle. * There seem to be some pointless #include additions, eg in planner.c the added code doesn't look to justify any of them. Please also avoid unnecessary whitespace changes, those also slow down reviewing. * I see you decided to add a new regression test file select_distinct_2. That's a poor choice of name because it conflicts with our rules for the naming of alternative output files. Besides which, you forgot to plug it into the test schedule files, so it isn't actually getting run. Is there a reason not to just add the new test cases to select_distinct? * There are some changes in existing regression cases that aren't visibly related to the stated purpose of the patch, eg it now notices that "select distinct max(unique2) from tenk1" doesn't require an explicit DISTINCT step. That's not wrong, but I wonder if maybe you should subdivide this patch into more than one patch, because that must be coming from some separate change. I'm also wondering what caused the plan change in expected/join.out. regards, tom lane
pgsql-hackers by date: