Thread: Index usage for tstzrange?
Folks, I just noticed that if I use a tstzrange for convenience, a standard btree index on a timestamp won't get used for it. Example: table a ( id int, val text, ts timestamptz ); index a_ts on a(ts); SELECT * FROM a WHERE ts <@ tstzrange('2013-01-01','2013-01-01 00:10:00') ... will NOT use the index a_ts. Is this something which could be fixed for 9.4? -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com
Josh Berkus <josh@agliodbs.com> writes: > I just noticed that if I use a tstzrange for convenience, a standard > btree index on a timestamp won't get used for it. Example: > table a ( > id int, > val text, > ts timestamptz > ); > index a_ts on a(ts); > SELECT * FROM a WHERE ts <@ tstzrange('2013-01-01','2013-01-01 00:10:00') > ... will NOT use the index a_ts. Well, no. <@ is not a btree-indexable operator. What I find more disturbing is that this is what I get from the example in HEAD: regression=# explain SELECT * FROM a WHERE ts <@ tstzrange('2013-01-01','2013-01-01 00:10:00'); ERROR: XX000: type 1184 is not a range type LOCATION: range_get_typcache, rangetypes.c:1451 Haven't traced through it to determine exactly what's happening, but isn't this a legitimate usage? And if it isn't, surely a more user-facing error ought to be getting thrown somewhere upstream of here. regards, tom lane
On Thu, Mar 21, 2013 at 5:58 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Josh Berkus <josh@agliodbs.com> writes:
> I just noticed that if I use a tstzrange for convenience, a standard
> btree index on a timestamp won't get used for it. Example:
> table a (
> id int,
> val text,
> ts timestamptz
> );
> index a_ts on a(ts);
> SELECT * FROM a WHERE ts <@ tstzrange('2013-01-01','2013-01-01 00:10:00')
> ... will NOT use the index a_ts.
Well, no. <@ is not a btree-indexable operator.
What I find more disturbing is that this is what I get from the example
in HEAD:
regression=# explain SELECT * FROM a WHERE ts <@ tstzrange('2013-01-01','2013-01-01 00:10:00');
ERROR: XX000: type 1184 is not a range type
LOCATION: range_get_typcache, rangetypes.c:1451
Haven't traced through it to determine exactly what's happening, but
isn't this a legitimate usage? And if it isn't, surely a more
user-facing error ought to be getting thrown somewhere upstream of here.
regards, tom lane
--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance
It is a legit usage, this is from a test i did myself (9.2.3)
test=# explain SELECT * FROM a WHERE ts <@ tstzrange('2013-01-01','2013-04-01 00:10:00');
QUERY PLAN
------------------------------------------------------------------------------------
Seq Scan on a (cost=0.00..23.75 rows=1 width=44)
Filter: (ts <@ '["2013-01-01 00:00:00+02","2013-04-01 00:10:00+03")'::tstzrange)
On 21.03.2013 06:07, Vasilis Ventirozos wrote: > On Thu, Mar 21, 2013 at 5:58 AM, Tom Lane<tgl@sss.pgh.pa.us> wrote: >> What I find more disturbing is that this is what I get from the example >> in HEAD: >> >> regression=# explain SELECT * FROM a WHERE ts<@ >> tstzrange('2013-01-01','2013-01-01 00:10:00'); >> ERROR: XX000: type 1184 is not a range type >> LOCATION: range_get_typcache, rangetypes.c:1451 >> >> Haven't traced through it to determine exactly what's happening, but >> isn't this a legitimate usage? And if it isn't, surely a more >> user-facing error ought to be getting thrown somewhere upstream of here. > > It is a legit usage, this is from a test i did myself (9.2.3) > > test=# explain SELECT * FROM a WHERE ts<@ > tstzrange('2013-01-01','2013-04-01 00:10:00'); > QUERY PLAN > ------------------------------------------------------------------------------------ > Seq Scan on a (cost=0.00..23.75 rows=1 width=44) > Filter: (ts<@ '["2013-01-01 00:00:00+02","2013-04-01 > 00:10:00+03")'::tstzrange) Looks like the range type cost estimation patch broke this, back in August already. The case of var <@ constant, where constant is a range and var is an element, that's broken. The cost estimation function, rangesel(), incorrectly assumes that the 'var' is always a range type. It's a bit worrying that no-one noticed until now. I'll add a test for that operator to the rangetypes regression test. The immediate fix is attached, but this made me realize that rangesel() is still missing estimation for the "element <@ range" operator. It shouldn't be hard to implement, I'm pretty sure we have all the statistics we need for that. - Heikki
Attachment
> Well, no. <@ is not a btree-indexable operator. Yes, but it's equivalent to ( ( a >= b1 or b1 is null ) and ( a < b2 or b2 is null ) ), which *is* btree-indexable and can use an index. So it seems like the kind of optimization we could eventually make. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com
On 21.03.2013 17:55, Alexander Korotkov wrote: > On Thu, Mar 21, 2013 at 12:52 PM, Heikki Linnakangas< >> The immediate fix is attached, but this made me realize that rangesel() is >> still missing estimation for the "element<@ range" operator. It shouldn't >> be hard to implement, I'm pretty sure we have all the statistics we need >> for that. > > Probably we could even call existing scalarltsel and scalargtsel for this > case. I came up with the attached. I didn't quite use scalarltsel, but I used the scalarineqsel function, which contains the "guts" of scalarltsel and scalargtsel. One thing I wasn't quite sure of (from the patch): > /* > * We use the data type's default < operator. This is bogus, if the range > * type's rngsubopc operator class is different. In practice, that ought > * to be rare. It would also be bogus to use the < operator from the > * rngsubopc operator class, because the statistics are collected using > * using the default operator class, anyway. > * > * For the same reason, use the default collation. The statistics are > * collected with the default collation. > */ Does that make sense? The other option would be to use the < operator from the rngsubopc op class, even though the scalar statistics are collected with the default b-tree < operator. As long as the two sort roughly the same way, you get reasonable results either way. Yet another option would be to use histogram_selectivity() instead of ineq_histogram_selectivity(), if the range's rngsubopc opclass isn't the type's default opclass. histogram_selectivity() works with any operator regardless of the sort ordering, basically using the histogram values merely as a sample, rather than as a histogram. But I'm reluctant to make this any more complicated, as using a non-default opclass for the range type is rare. - Heikki
Attachment
On 22.03.2013 02:05, Josh Berkus wrote: >> Well, no.<@ is not a btree-indexable operator. > > Yes, but it's equivalent to ( ( a>= b1 or b1 is null ) and ( a< b2 or > b2 is null ) ), which *is* btree-indexable and can use an index. So it > seems like the kind of optimization we could eventually make. Yeah. The sort order of <@ is the same as regular b-tree, so it should be possible. In fact, nothing stops you from creating the suitable operator and b-tree support functions. See attached patch for int4, but the same should work for timestamptz. We should do this automatically. Or am I missing something? - Heikki
Attachment
> We should do this automatically. Or am I missing something? Aside from the need to support @> as well, not that I can see. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com
Heikki Linnakangas <hlinnakangas@vmware.com> writes: > We should do this automatically. Or am I missing something? Yes. This is not equality. > ALTER OPERATOR FAMILY integer_ops USING btree ADD > OPERATOR 3 <@ (int4, int4range), > FUNCTION 1 btint4rangecmp(int4, int4range); That will break approximately everything in sight, starting with the planner's opinion of what equality is. There is *way* too much stuff that knows the semantics of btree opclasses for us to start jamming random operators into them, even if this seemed to work in trivial testing. (See the last section of src/backend/access/nbtree/README to just scratch the surface of the assumptions this breaks.) It's possible that for constant ranges we could have the planner expand "intcol <@ 'x,y'::int4range" into "intcol between x and y", using something similar to the index LIKE optimization (ie, the "special operator" stuff in indxpath.c). I'd like to find a way to make that type of optimization pluggable, though --- the existing approach of hard-wiring knowledge into indxpath.c has never been anything but a kluge, and it definitely doesn't scale as-is to anything except built-in types and operators. regards, tom lane