Thread: Index usage for tstzrange?

Index usage for tstzrange?

From

Josh Berkus

Date:

21 March 2013, 03:13:11

Folks,

I just noticed that if I use a tstzrange for convenience, a standard
btree index on a timestamp won't get used for it.  Example:

table a (
    id int,
    val text,
    ts timestamptz
);
index a_ts on a(ts);

SELECT * FROM a WHERE ts <@ tstzrange('2013-01-01','2013-01-01 00:10:00')

... will NOT use the index a_ts.  Is this something which could be fixed
for 9.4?

--
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com

Re: Index usage for tstzrange?

From

Tom Lane

Date:

21 March 2013, 06:59:04

Josh Berkus <josh@agliodbs.com> writes:
> I just noticed that if I use a tstzrange for convenience, a standard
> btree index on a timestamp won't get used for it.  Example:

> table a (
>     id int,
>     val text,
>     ts timestamptz
> );
> index a_ts on a(ts);

> SELECT * FROM a WHERE ts <@ tstzrange('2013-01-01','2013-01-01 00:10:00')

> ... will NOT use the index a_ts.

Well, no.  <@ is not a btree-indexable operator.

What I find more disturbing is that this is what I get from the example
in HEAD:

regression=# explain SELECT * FROM a WHERE ts <@ tstzrange('2013-01-01','2013-01-01 00:10:00');
ERROR:  XX000: type 1184 is not a range type
LOCATION:  range_get_typcache, rangetypes.c:1451

Haven't traced through it to determine exactly what's happening, but
isn't this a legitimate usage?  And if it isn't, surely a more
user-facing error ought to be getting thrown somewhere upstream of here.

            regards, tom lane

Re: Index usage for tstzrange?

From

Vasilis Ventirozos

Date:

21 March 2013, 07:07:32

On Thu, Mar 21, 2013 at 5:58 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Josh Berkus <josh@agliodbs.com> writes:
> I just noticed that if I use a tstzrange for convenience, a standard
> btree index on a timestamp won't get used for it. Example:

> table a (
> id int,
> val text,
> ts timestamptz
> );
> index a_ts on a(ts);

> SELECT * FROM a WHERE ts <@ tstzrange('2013-01-01','2013-01-01 00:10:00')

> ... will NOT use the index a_ts.

Well, no. <@ is not a btree-indexable operator.

What I find more disturbing is that this is what I get from the example
in HEAD:

regression=# explain SELECT * FROM a WHERE ts <@ tstzrange('2013-01-01','2013-01-01 00:10:00');
ERROR: XX000: type 1184 is not a range type
LOCATION: range_get_typcache, rangetypes.c:1451

Haven't traced through it to determine exactly what's happening, but
isn't this a legitimate usage? And if it isn't, surely a more
user-facing error ought to be getting thrown somewhere upstream of here.

regards, tom lane

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

It is a legit usage, this is from a test i did myself (9.2.3)

test=# explain SELECT * FROM a WHERE ts <@ tstzrange('2013-01-01','2013-04-01 00:10:00');
QUERY PLAN
------------------------------------------------------------------------------------
Seq Scan on a (cost=0.00..23.75 rows=1 width=44)
Filter: (ts <@ '["2013-01-01 00:00:00+02","2013-04-01 00:10:00+03")'::tstzrange)

Re: Index usage for tstzrange?

From

Heikki Linnakangas

Date:

21 March 2013, 12:05:19

On 21.03.2013 06:07, Vasilis Ventirozos wrote:
> On Thu, Mar 21, 2013 at 5:58 AM, Tom Lane<tgl@sss.pgh.pa.us>  wrote:
>> What I find more disturbing is that this is what I get from the example
>> in HEAD:
>>
>> regression=# explain SELECT * FROM a WHERE ts<@
>> tstzrange('2013-01-01','2013-01-01 00:10:00');
>> ERROR:  XX000: type 1184 is not a range type
>> LOCATION:  range_get_typcache, rangetypes.c:1451
>>
>> Haven't traced through it to determine exactly what's happening, but
>> isn't this a legitimate usage?  And if it isn't, surely a more
>> user-facing error ought to be getting thrown somewhere upstream of here.
>
> It is a legit usage, this is from a test i did myself (9.2.3)
>
> test=# explain SELECT * FROM a WHERE ts<@
> tstzrange('2013-01-01','2013-04-01 00:10:00');
>                                       QUERY PLAN
> ------------------------------------------------------------------------------------
>   Seq Scan on a  (cost=0.00..23.75 rows=1 width=44)
>     Filter: (ts<@ '["2013-01-01 00:00:00+02","2013-04-01
> 00:10:00+03")'::tstzrange)

Looks like the range type cost estimation patch broke this, back in
August already. The case of var <@ constant, where constant is a range
and var is an element, that's broken. The cost estimation function,
rangesel(), incorrectly assumes that the 'var' is always a range type.

It's a bit worrying that no-one noticed until now. I'll add a test for
that operator to the rangetypes regression test.

The immediate fix is attached, but this made me realize that rangesel()
is still missing estimation for the "element <@ range" operator. It
shouldn't be hard to implement, I'm pretty sure we have all the
statistics we need for that.

- Heikki

Attachment

fix-element-contained-by-range-op.patch

Re: Index usage for tstzrange?

From

Josh Berkus

Date:

22 March 2013, 03:05:45

> Well, no.  <@ is not a btree-indexable operator.

Yes, but it's equivalent to ( ( a >= b1 or b1 is null ) and ( a < b2 or
b2 is null ) ), which *is* btree-indexable and can use an index.  So it
seems like the kind of optimization we could eventually make.

--
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com

Re: Index usage for tstzrange?

From

Heikki Linnakangas

Date:

23 March 2013, 00:53:42

On 21.03.2013 17:55, Alexander Korotkov wrote:
> On Thu, Mar 21, 2013 at 12:52 PM, Heikki Linnakangas<
>> The immediate fix is attached, but this made me realize that rangesel() is
>> still missing estimation for the "element<@ range" operator. It shouldn't
>> be hard to implement, I'm pretty sure we have all the statistics we need
>> for that.
>
> Probably we could even call existing scalarltsel and scalargtsel for this
> case.

I came up with the attached. I didn't quite use scalarltsel, but I used
the scalarineqsel function, which contains the "guts" of scalarltsel and
scalargtsel.

One thing I wasn't quite sure of (from the patch):

>     /*
>      * We use the data type's default < operator. This is bogus, if the range
>      * type's rngsubopc operator class is different. In practice, that ought
>      * to be rare. It would also be bogus to use the < operator from the
>      * rngsubopc operator class, because the statistics are collected using
>      * using the default operator class, anyway.
>      *
>      * For the same reason, use the default collation. The statistics are
>      * collected with the default collation.
>      */

Does that make sense? The other option would be to use the < operator
from the rngsubopc op class, even though the scalar statistics are
collected with the default b-tree < operator. As long as the two sort
roughly the same way, you get reasonable results either way. Yet another
option would be to use histogram_selectivity() instead of
ineq_histogram_selectivity(), if the range's rngsubopc opclass isn't the
type's default opclass. histogram_selectivity() works with any operator
regardless of the sort ordering, basically using the histogram values
merely as a sample, rather than as a histogram. But I'm reluctant to
make this any more complicated, as using a non-default opclass for the
range type is rare.

- Heikki

Attachment

estimate-elem-contained-by-1.patch

Re: Index usage for tstzrange?

From

Heikki Linnakangas

Date:

23 March 2013, 01:53:26

On 22.03.2013 02:05, Josh Berkus wrote:
>> Well, no.<@ is not a btree-indexable operator.
>
> Yes, but it's equivalent to ( ( a>= b1 or b1 is null ) and ( a<  b2 or
> b2 is null ) ), which *is* btree-indexable and can use an index.  So it
> seems like the kind of optimization we could eventually make.

Yeah. The sort order of <@ is the same as regular b-tree, so it should
be possible. In fact, nothing stops you from creating the suitable
operator and b-tree support functions. See attached patch for int4, but
the same should work for timestamptz.

We should do this automatically. Or am I missing something?

- Heikki

Attachment

btree-support-for-int4-contained-by-int4range.sql

Re: Index usage for tstzrange?

From

Josh Berkus

Date:

23 March 2013, 04:23:59

> We should do this automatically. Or am I missing something?

Aside from the need to support @> as well, not that I can see.

--
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com

Re: Index usage for tstzrange?

From

Tom Lane

Date:

23 March 2013, 07:31:57

Heikki Linnakangas <hlinnakangas@vmware.com> writes:
> We should do this automatically. Or am I missing something?

Yes.  This is not equality.

> ALTER OPERATOR FAMILY integer_ops USING btree ADD
>   OPERATOR 3 <@ (int4, int4range),
>   FUNCTION 1 btint4rangecmp(int4, int4range);

That will break approximately everything in sight, starting with the
planner's opinion of what equality is.  There is *way* too much stuff
that knows the semantics of btree opclasses for us to start jamming
random operators into them, even if this seemed to work in trivial
testing.  (See the last section of src/backend/access/nbtree/README
to just scratch the surface of the assumptions this breaks.)

It's possible that for constant ranges we could have the planner expand
"intcol <@ 'x,y'::int4range" into "intcol between x and y", using
something similar to the index LIKE optimization (ie, the "special
operator" stuff in indxpath.c).  I'd like to find a way to make that
type of optimization pluggable, though --- the existing approach of
hard-wiring knowledge into indxpath.c has never been anything but
a kluge, and it definitely doesn't scale as-is to anything except
built-in types and operators.

            regards, tom lane