Thread: How to create a multi-column index with 2 dates using 'gist'?

How to create a multi-column index with 2 dates using 'gist'?

From
Fred Janon
Date:
Hi,

I am using 8.3 and pgAdmin III. I have a couple of tables using 2 DATE columns like 'startdate' and 'enddate' (just date, not interested in time in these columns). I have some queries (some using OVERLAPS) involving both 'startdate' and 'enddate' columns. I tried to create a multi column index using pgAdmin and it comes back with this error:

ERROR: data type date has no default operator class for access method "gist"
HINT: You must specify an operator class for the index or define a default operator class for the data type.

I search the pdf docs and online without finding what an "operator class" for DATE would be. Would a multi-column index help in that case (OVERLAPS and dates comparison) anyway? Or should I just define an index for each of the dates?

Below are the table and index defintions.

Thanks

Fred

---------------------------------------------
CREATE INDEX startenddate
   ON times USING gist (startdate, enddate);

---------------------------------------------
-- Table: times

-- DROP TABLE times;

CREATE TABLE times
(
  id serial NOT NULL,
  startdate date NOT NULL,
  enddate date NOT NULL,
  starttime time without time zone,
  endtime time without time zone,
  CONSTRAINT pk_id PRIMARY KEY (id)
)
WITH (OIDS=FALSE);
ALTER TABLE times OWNER TO postgres;
GRANT ALL ON TABLE times TO postgres;
GRANT ALL ON TABLE times TO public;

Re: How to create a multi-column index with 2 dates using 'gist'?

From
Sam Mason
Date:
On Mon, Aug 24, 2009 at 05:24:59PM +0800, Fred Janon wrote:
> I am using 8.3 and pgAdmin III. I have a couple of tables using 2 DATE
> columns like 'startdate' and 'enddate' (just date, not interested in time in
> these columns). I have some queries (some using OVERLAPS) involving both
> 'startdate' and 'enddate' columns. I tried to create a multi column index
> using pgAdmin and it comes back with this error:
>
> ERROR: data type date has no default operator class for access method "gist"
> HINT: You must specify an operator class for the index or define a default
> operator class for the data type.

I've not had the opportunity to try doing this, but it would seem to
require hacking some C code to get this working.  Have a look here:

  http://www.postgresql.org/docs/current/static/gist.html

> I search the pdf docs and online without finding what an "operator class"
> for DATE would be. Would a multi-column index help in that case (OVERLAPS
> and dates comparison) anyway? Or should I just define an index for each of
> the dates?

An operator class bundles together various bits of code so that the
index knows which functions to call when it needs to compare things.

If you were creating an GiST index over a pair of dates to support
an "overlaps" operator you'd have to define a set of functions that
implement the various checks needed.


Depending on your data you may be easier with just a multi-column index
and using normal comparisons, I can't see how OVERLAPS could use indexes
as it does some strange things with NULL values.  The cases a B-Tree
index would win over GiST (this is an educated guess) is when few of the
ranges overlap within a table.  If that's the case then I'd do:

  CREATE INDEX tbl_start_end_idx ON tbl (startdate,enddate);

to create the btree index (they're the default, so nothing else is
needed) and then write queries as:

  SELECT r.range, t.*
  FROM tbl t, ranges r
  WHERE t.startdate <= r.rangeend
    AND t.enddate   >= r.rangestart;

if there are lots of overlapping ranges in the table then this is going
to do badly and you may need to start thinking about writing some C code
to get a GiST index going.

--
  Sam  http://samason.me.uk/

Re: How to create a multi-column index with 2 dates using 'gist'?

From
Fred Janon
Date:
Thanks Sam. I looked at the gist documentation and although it would be fun, I don't have the time at the moment to explore that avenue (and scratching my head!). I also think it would require a lot of work testing to validate the code and that the gist index is better than the B-tree one. So I am following your advice using a B-tree index for now.

Basically I have an events table representing events with a duration (startdate, enddate). I was wondering if it would improve the performance if I was creating a separate table (indexed as you suggested) with the date ranges (startdate, enddate) and point to that from my events table. That would eliminate the duplicate ranges, costing a join to find the events within a date range, but maybe improving the search performance for events that overlap a certain date range. Any feedback on that?

Thanks

Fred

On Tue, Aug 25, 2009 at 18:52, Sam Mason <sam@samason.me.uk> wrote:
On Mon, Aug 24, 2009 at 05:24:59PM +0800, Fred Janon wrote:
> I am using 8.3 and pgAdmin III. I have a couple of tables using 2 DATE
> columns like 'startdate' and 'enddate' (just date, not interested in time in
> these columns). I have some queries (some using OVERLAPS) involving both
> 'startdate' and 'enddate' columns. I tried to create a multi column index
> using pgAdmin and it comes back with this error:
>
> ERROR: data type date has no default operator class for access method "gist"
> HINT: You must specify an operator class for the index or define a default
> operator class for the data type.

I've not had the opportunity to try doing this, but it would seem to
require hacking some C code to get this working.  Have a look here:

 http://www.postgresql.org/docs/current/static/gist.html

> I search the pdf docs and online without finding what an "operator class"
> for DATE would be. Would a multi-column index help in that case (OVERLAPS
> and dates comparison) anyway? Or should I just define an index for each of
> the dates?

An operator class bundles together various bits of code so that the
index knows which functions to call when it needs to compare things.

If you were creating an GiST index over a pair of dates to support
an "overlaps" operator you'd have to define a set of functions that
implement the various checks needed.


Depending on your data you may be easier with just a multi-column index
and using normal comparisons, I can't see how OVERLAPS could use indexes
as it does some strange things with NULL values.  The cases a B-Tree
index would win over GiST (this is an educated guess) is when few of the
ranges overlap within a table.  If that's the case then I'd do:

 CREATE INDEX tbl_start_end_idx ON tbl (startdate,enddate);

to create the btree index (they're the default, so nothing else is
needed) and then write queries as:

 SELECT r.range, t.*
 FROM tbl t, ranges r
 WHERE t.startdate <= r.rangeend
   AND t.enddate   >= r.rangestart;

if there are lots of overlapping ranges in the table then this is going
to do badly and you may need to start thinking about writing some C code
to get a GiST index going.

--
 Sam  http://samason.me.uk/

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: How to create a multi-column index with 2 dates using 'gist'?

From
Sam Mason
Date:
On Tue, Aug 25, 2009 at 07:39:26PM +0800, Fred Janon wrote:
> Basically I have an events table representing events with a duration
> (startdate, enddate). I was wondering if it would improve the performance if
> I was creating a separate table (indexed as you suggested) with the date
> ranges (startdate, enddate) and point to that from my events table. That
> would eliminate the duplicate ranges, costing a join to find the events
> within a date range, but maybe improving the search performance for events
> that overlap a certain date range. Any feedback on that?

It depends on the sorts of queries you're going to be doing most often.

Not sure how is best to explain when GiST is going to win, but if you
think of a rectangle with the start dates going along the top edge and
the end dates going down the side.  If you sort the values by the start
date will you end up with most of them on a diagonal or will they be
scattered randomly around.  I.e the less correlation between the start
and end date the better GiST will do, relative to a btree index.  I
think that's right anyway!

--
  Sam  http://samason.me.uk/

Re: How to create a multi-column index with 2 dates using 'gist'?

From
Gerhard Heift
Date:
On Mon, Aug 24, 2009 at 05:24:59PM +0800, Fred Janon wrote:
> Hi,
>
> I am using 8.3 and pgAdmin III. I have a couple of tables using 2 DATE columns
> like 'startdate' and 'enddate' (just date, not interested in time in these
> columns). I have some queries (some using OVERLAPS) involving both 'startdate'
> and 'enddate' columns. I tried to create a multi column index using pgAdmin and
> it comes back with this error:
>
> ERROR: data type date has no default operator class for access method "gist"
> HINT: You must specify an operator class for the index or define a default
> operator class for the data type.
>
> I search the pdf docs and online without finding what an "operator class" for
> DATE would be. Would a multi-column index help in that case (OVERLAPS and dates
> comparison) anyway? Or should I just define an index for each of the dates?
>
> Below are the table and index defintions.

Have a look at http://pgfoundry.org/projects/temporal

But currently there is no way to avoid overlapping of such periods :(

> Thanks
>
> Fred

Regards,
  Gerhard

Attachment

Re: How to create a multi-column index with 2 dates using 'gist'?

From
Fred Janon
Date:
Thanks Gerhard, interesting but I wonder if it is a maintained project, the files date from May 2008 and there is not much forum activity. I'll out it on my list of "To be investigated".

Fred

On Wed, Aug 26, 2009 at 22:26, Gerhard Heift <ml-postgresql-20081012-3518@gheift.de> wrote:
But currently there is no way to avoid overlapping of such periods

Re: How to create a multi-column index with 2 dates using 'gist'?

From
Jeff Davis
Date:
On Wed, 2009-08-26 at 23:08 +0800, Fred Janon wrote:
> Thanks Gerhard, interesting but I wonder if it is a maintained
> project, the files date from May 2008 and there is not much forum
> activity. I'll out it on my list of "To be investigated".

Well, it's maintained in the sense of "I don't know of any problems with
it." Right now all it does is implement the PERIOD data type, which is
indexable so that you can do searches on predicates like
"&&" (overlaps).

It may get a little more exciting when more features like temporal keys
(which I'm planning to make possible in the next commitfest) or temporal
joins (no serious plans yet, but seems doable) are implemented.

Regards,
    Jeff Davis