Re: Index on a range array - Mailing list pgsql-performance

From Heikki Linnakangas
Subject Re: Index on a range array
Date
Msg-id 520C83AF.5040800@vmware.com
Whole thread Raw
In response to Index on a range array  (Daniel Cristian Cruz <danielcristian@gmail.com>)
List pgsql-performance
On 13.08.2013 23:47, Daniel Cristian Cruz wrote:
> Hello,
>
> I'm trying to simplify a schema, where I had many ranges floating around.
> My idea is to put them all in an array field and query like this:
>
> SELECT
>   event.*
> FROM event
> JOIN participant_details
>   USING (participant_id)
> WHERE
>   tsrange(event.start, event.end)&&  ANY (participant_details.periods);
>
> periods is tsrange[].
>
> I've tryed and it worked, but without indexes. I've tried something, but
> didn't found anything... Does someone know how to index this kind of field
> (tsrange[])?
>
>  From the docs I learn that there is some GIST magic, but I would need to
> code in C. Is that true?

Yeah. It might be somewhat tricky to write an efficient GIST
implementation for this anyway. What you'd really want to do is to index
each value in the array separately, which is more like what GIN does.
With the "partial match" infrastructure in GIN, it might be possible to
write a GIN implementation that can speed up range overlap queries.
However, that certainly requires C coding too.

A couple of alternatives come to mind:

You could create the index on just the min and max values of the
periods, and in the query check for overlap with that. If there
typically aren't big gaps between the periods of each participant, that
might work well.

Or you could split the range of expected timestamps into discrete steps,
for example at one-day granularity. Create a function to convert a range
into an array of steps, e.g convert each range into an array of days
that the range overlaps with. Create a GIN index on that array, and use
it in the query. Something like this:

-- Returns an int representing the day the given timestamp falls into
create function epochday(timestamp) returns int4 as $$
   select extract (epoch from $1)::int4/(24*3600)
$$ language sql immutable;

-- Same for a range. Returns an array of ints representing all the
-- days that the given range overlaps with.
create function epochdays(tsrange) returns integer[]
as $$
   select array_agg(g) from generate_series(epochday(lower($1)),
epochday(upper($1))) g
$$
language sql immutable;

-- Same for an array of ranges. Returns an array of ints representing --
all the days that overlap with any of the given timestamp ranges
create function epochdays(ranges tsrange[]) returns integer[]
as $$
declare
   r tsrange;
   result integer[];
begin
   foreach r in array ranges loop
     result = result || (select array_agg(g) from
generate_series(epochday(lower(r)), epochday(upper(r))) g);
   end loop;
   return result;
end;
$$ language plpgsql immutable;

-- Create the index on that:
create index period_days on participant_details using gin
(epochdays(periods));

-- Query like this:
SELECT event.* FROM event
JOIN participant_details USING (participant_id)
-- This WHERE-clause is for correctness:
WHERE tsrange(event.start, event.end) &&  ANY (participant_details.periods);
-- and this is to make use of the index:
AND epochdays(tsrange(event.start, event.end)) &&
epochdays((participant_details.periods));

- Heikki


pgsql-performance by date:

Previous
From: Tom Lane
Date:
Subject: Re: Interesting case of IMMUTABLE significantly hurting performance
Next
From: Josh Berkus
Date:
Subject: Re: Need some basic information