Problem with CVS HEAD's handling of mergejoins - Mailing list pgsql-hackers
From | Tom Lane |
---|---|
Subject | Problem with CVS HEAD's handling of mergejoins |
Date | |
Msg-id | 24443.1199844480@sss.pgh.pa.us Whole thread Raw |
Responses |
Re: Problem with CVS HEAD's handling of mergejoins
Re: Problem with CVS HEAD's handling of mergejoins |
List | pgsql-hackers |
So I adjusted the patch I was working on as suggested here http://archives.postgresql.org/pgsql-hackers/2008-01/msg00251.php and things started blowing up all over the place --- Assert failures, "too few pathkeys for mergeclauses" errors, etc :-( On investigation, the problem seems to be a bit of brain-fade on my part. The planner uses "PathKeys" lists both to represent what's known about the output sort order of a Path, and to represent per-merge-clause ordering data for a merge join. A PathKeys list that represents sort ordering ought to be "canonical", meaning it contains no redundant keys --- a key might be redundant because it is the same as some prior key in the list ("ORDER BY x,x") or because it is known equal to a constant and thus uninteresting for sorting ("WHERE x = 1 ... ORDER BY x"). However, there are several places in the planner that expect the pathkeys for a merge join to be one-for-one with the selected merge clauses. I'm not sure why we didn't come across test cases exposing this problem earlier in beta. A partial explanation is that the equal-to-a-constant case was unlikely to arise before, because given "WHERE x = y AND x = 1" the code up to now would get rid of "x = y" altogether and then never try for a mergejoin; my patch to eliminate that behavior was what exposed the problem. But there are other ways to get redundant keys in a list of candidate mergejoin clauses. One possibility seems to be to keep a list of "raw" (not canonical) pathkeys for each side of a list of proposed mergejoin clauses, which is one-to-one with the clauses, with the clear understanding that the canonical list that describes the path's sort ordering might be just a subset of this list. This would mean a couple of new fields in MergePath structs, but fortunately no on-disk format changes since MergePaths never get to disk. A perhaps less invasive idea is to discard any proposed mergeclauses that are redundant in this sense. This would still require some reshuffling of responsibility between select_mergejoin_clauses and the code in pathkeys.c, since right now select_mergejoin_clauses takes no account of that. However, I'm worried that that might result in planner failure on some FULL JOIN cases that work today, since we require all the join clauses to be mergejoinable for a FULL JOIN. People seem to complain when the planner fails, even for really stupid queries ;-). I think this would only be acceptable if we can prove that ignoring clauses that are redundant in this sense doesn't change the result --- which might be the case, but I'm not sure. I think I can fix this in a day or so, but I now definitely feel that we'll need an RC2 :-( regards, tom lane
pgsql-hackers by date: