Re: New approach to ye olde cross-datatype indexing problem - Mailing list pgsql-hackers
From | Tom Lane |
---|---|
Subject | Re: New approach to ye olde cross-datatype indexing problem |
Date | |
Msg-id | 29832.1068682253@sss.pgh.pa.us Whole thread Raw |
In response to | New approach to ye olde cross-datatype indexing problem (Tom Lane <tgl@sss.pgh.pa.us>) |
Responses |
Re: New approach to ye olde cross-datatype indexing problem
|
List | pgsql-hackers |
I wrote: > What if we attack the problem head on --- make indexes able to handle > cross-datatype comparison operators directly? That would solve the > problem without *any* semantic side-effects. AFAIR we've never seriously > considered doing that; I guess anyone who thought of it dismissed it as > too hard. But I've been studying the idea for the last day or so and I'm > starting to think that it's eminently practical. In fact, it works --- as of CVS tip, you can do: regression=# create table foo (f1 bigint primary key); NOTICE: CREATE TABLE / PRIMARY KEY will create implicit index "foo_pkey" for table "foo" CREATE TABLE regression=# explain select * from foo where f1 = 44; QUERY PLAN --------------------------------------------------------------------Index Scan using foo_pkey on foo (cost=0.00..4.82 rows=1width=8) Index Cond: (f1 = 44) (2 rows) or even regression=# explain select * from foo where f1 = 44::int2; QUERY PLAN --------------------------------------------------------------------Index Scan using foo_pkey on foo (cost=0.00..4.82 rows=1width=8) Index Cond: (f1 = 44::smallint) (2 rows) As seems to happen every time I touch the btree code, I learned a few formerly-undocumented details about how it works :-(. It turns out that allowing an operator class to include operators between the indexed datatype and other datatypes isn't really sufficient for everything the backend wants to do. In particular, there are places where the system wants to determine which of two inequalities is a tighter constraint. For example, consider a query like SELECT ... WHERE x > 4 AND x >= 5; The x > 4 condition is redundant and can be thrown away. The btree subsystem actually has code to do this, and in fact was *depending on* the assumption that it could do so, because there were bits of logic that assumed they'd not have to deal with more than one inequality constraint at a time. Now in order to do this, the code has to compare the constants (4 and 5). This is no problem as long as an operator class deals with just a single datatype --- you can pick the needed comparison operator out of the opclass. However, in a cross-type situation you could have something like SELECT ... WHERE int8var > 4::int4 AND int8var >= 5::int2; where the operators appearing in the query are int84lt and int82le. To determine which condition is a tighter constraint, you'd need to apply an int4-vs-int2 comparison --- and there are no such operators available in the int8 opclass. Since the whole point of an opclass is to encapsulate a particular comparison semantics, it's not reasonable for the system to guess at which comparison operator on the other type(s) is compatible with the opclass it's working with. And coercing the comparison values to the opclass' own datatype is no good (that opens up all the problems we wanted to avoid, such as possible overflow). For the moment, I've just punted this problem; the btree code will not optimize away redundant cross-type comparisons. (I went and fixed the places that failed to cope with extra conditions.) It would be nice to do better, but it's surely a much lower priority than getting cross-type comparisons to be indexed at all. Looking to the long term of how we might deal with this if we wanted to invest more effort, I can think of a couple of ways to proceed. One is to allow an opclass to contain operators that aren't directly usable with the index at all, but do apply to datatypes that are used within the opclass --- for instance, we'd have to add all the int4-vs-int4 operators, *and* the int4-vs-int2 and int2-vs-int2 operators, into the int8 opclass; likewise for the int2 and int4 opclasses. This seems like a recipe for bloating pg_amop quite a lot; there'd be a lot of redundancy among these opclasses. The other idea I had was to somehow create associations of opclasses, so that you could say that "int2_ops, int4_ops, and int8_ops all sort values compatibly, so it's okay to go find the operator you need in int4_ops or int2_ops". I don't have any concrete idea about how to represent that though. Any thoughts about that, or other ideas altogether? For the moment I'm satisfied with what we have; this is already a big step forward on a problem that's been there forever. But there's always something else that could be done ... regards, tom lane
pgsql-hackers by date: