Indexes with duplicate columns - Mailing list pgsql-hackers

From Tom Lane
Subject Indexes with duplicate columns
Date
Msg-id 16303.1324574256@sss.pgh.pa.us
Whole thread Raw
List pgsql-hackers
In bug #6351 it's pointed out that this fails unexpectedly:

CREATE TABLE tab (id SERIAL, a INTEGER, b INTEGER);
CREATE INDEX tab123 ON tab (a, b, a);
SELECT a, b FROM tab WHERE a = 0 AND b = 1;
ERROR:  btree index keys must be ordered by attribute

I looked into this a bit and find that indxpath.c is producing a correct
list of index quals, "a = 0 AND b = 1 AND a = 0", but then createplan.c
messes it up because it matches both copies of "a" to the first possible
match in the index's column list.  So what the executor gets looks like
"{INDEX_VAR 1} = 0 AND {INDEX_VAR 2} = 1 AND {INDEX_VAR 1} = 0"
and there's a btree implementation restriction that makes it spit up on
that.

Now, what the planner did here is more wrong than just tripping over a
btree limitation.  The actual use-case for an index mentioning the same
table column more than once, IMO, would be if the index columns had
different operator classes and thus supported different sets of
indexable operators.  Matching the second instance of "a" to the first
index column could then be asking the index to implement an operator
that that column doesn't support.  So we need to fix the planner not
btree.

The representation that indxpath.c actually emits is correct and
unambiguous, because it produces a list of sublists of indexquals,
one sublist per index column.  So in that format it's clear which
index column each clause is meant for.  But then we flatten the list
in create_index_path, and so createplan.c has to reverse-engineer
the matching, and it really can't get it right unless we're willing
to make it recheck operator matching not only column matching.

The obvious solution seems to be to preserve the list-of-sublists
representation through to createplan.c.  Then that code can just verify
the expected column match instead of searching, so it might actually be
a tad faster.  However such a change is going to affect cost_index and
all the amcostestimate functions, and likely break any planner plugins
that do anything with IndexPaths.  So it's going to be a bit invasive
and I'm inclined to think it's not back-patchable.

My inclination is to fix it that way in HEAD and just leave it as a
known unsupported case in the back branches.  This case does not seem
important enough to justify trying to find a solution that would work
without a path representation change.

Comments, other ideas?
        regards, tom lane


pgsql-hackers by date:

Previous
From: Stephen Frost
Date:
Subject: Re: Wishlist: parameterizable types
Next
From: Robert Haas
Date:
Subject: Re: reprise: pretty print viewdefs