RE: Indexes for inequalities - Mailing list pgsql-novice

From Stephen Froehlich
Subject RE: Indexes for inequalities
Date
Msg-id CY1PR0601MB121088854901F6DDE541DC2FE5B30@CY1PR0601MB1210.namprd06.prod.outlook.com
Whole thread Raw
In response to Re: Indexes for inequalities  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: Indexes for inequalities
List pgsql-novice
I'm averaging 1.5 ranges per ID over about 3 months.

Still, it's a good opportunity to learn to do it right.

Follow-up questions:
 - what data type do you use to go from RPostgreSQL to a tztrange using dbWriteTable?  (Or do I need to make it into a
textfield in R first ... which is doable.) 
 - how do you build a constraint that there are no overlapping ranges for a given ID?

Thanks again,
Stephen

-----Original Message-----
From: Tom Lane <tgl@sss.pgh.pa.us>
Sent: Friday, April 13, 2018 11:24 AM
To: Stephen Froehlich <s.froehlich@cablelabs.com>
Cc: pgsql-novice <pgsql-novice@postgresql.org>
Subject: Re: Indexes for inequalities

Stephen Froehlich <s.froehlich@cablelabs.com> writes:
> I am creating an intersect table where I have a relationship that is true for a period of time and then a series of
observations,so we're looking at something like: 

> SELECT * FROM
> observations
> INNER JOIN
> relationships
> ON (observations.id = relationships.id AND observations.time >=
> relationships.time_from AND observations.time < relationships.time_to)

> How do I best build indexes on "relationships", which is a few hundred
> thousand lines in length for a fast join?

You won't get terribly great results from standard btree indexes on that sort of range test.  If there are not too many
relationshipsentries per "id" then it might not matter, but if there are a lot then you need decent index selectivity
forthe time aspect too.  You might do better by representing the time_from/time_to pair as a range and then using a
GISTindex on the range, along the lines of 

SELECT * FROM
observations
INNER JOIN
relationships
ON (observations.id = relationships.id AND observations.time <@ tstzrange(relationships.time_from,
relationships.time_to))

I think you'd need the btree_gist extension as well, so that the index can be like

create index on relationships using gist (id, tstzrange(time_from,time_to));

You could do it just like this and leave the table storage alone, but it might be better to materialize the range value
asan actual column in the table. 

            regards, tom lane


pgsql-novice by date:

Previous
From: Tom Lane
Date:
Subject: Re: Indexes for inequalities
Next
From: "David G. Johnston"
Date:
Subject: Re: Indexes for inequalities