Re: Index on a range array - Mailing list pgsql-performance
From | Heikki Linnakangas |
---|---|
Subject | Re: Index on a range array |
Date | |
Msg-id | 520C83AF.5040800@vmware.com Whole thread Raw |
In response to | Index on a range array (Daniel Cristian Cruz <danielcristian@gmail.com>) |
List | pgsql-performance |
On 13.08.2013 23:47, Daniel Cristian Cruz wrote: > Hello, > > I'm trying to simplify a schema, where I had many ranges floating around. > My idea is to put them all in an array field and query like this: > > SELECT > event.* > FROM event > JOIN participant_details > USING (participant_id) > WHERE > tsrange(event.start, event.end)&& ANY (participant_details.periods); > > periods is tsrange[]. > > I've tryed and it worked, but without indexes. I've tried something, but > didn't found anything... Does someone know how to index this kind of field > (tsrange[])? > > From the docs I learn that there is some GIST magic, but I would need to > code in C. Is that true? Yeah. It might be somewhat tricky to write an efficient GIST implementation for this anyway. What you'd really want to do is to index each value in the array separately, which is more like what GIN does. With the "partial match" infrastructure in GIN, it might be possible to write a GIN implementation that can speed up range overlap queries. However, that certainly requires C coding too. A couple of alternatives come to mind: You could create the index on just the min and max values of the periods, and in the query check for overlap with that. If there typically aren't big gaps between the periods of each participant, that might work well. Or you could split the range of expected timestamps into discrete steps, for example at one-day granularity. Create a function to convert a range into an array of steps, e.g convert each range into an array of days that the range overlaps with. Create a GIN index on that array, and use it in the query. Something like this: -- Returns an int representing the day the given timestamp falls into create function epochday(timestamp) returns int4 as $$ select extract (epoch from $1)::int4/(24*3600) $$ language sql immutable; -- Same for a range. Returns an array of ints representing all the -- days that the given range overlaps with. create function epochdays(tsrange) returns integer[] as $$ select array_agg(g) from generate_series(epochday(lower($1)), epochday(upper($1))) g $$ language sql immutable; -- Same for an array of ranges. Returns an array of ints representing -- all the days that overlap with any of the given timestamp ranges create function epochdays(ranges tsrange[]) returns integer[] as $$ declare r tsrange; result integer[]; begin foreach r in array ranges loop result = result || (select array_agg(g) from generate_series(epochday(lower(r)), epochday(upper(r))) g); end loop; return result; end; $$ language plpgsql immutable; -- Create the index on that: create index period_days on participant_details using gin (epochdays(periods)); -- Query like this: SELECT event.* FROM event JOIN participant_details USING (participant_id) -- This WHERE-clause is for correctness: WHERE tsrange(event.start, event.end) && ANY (participant_details.periods); -- and this is to make use of the index: AND epochdays(tsrange(event.start, event.end)) && epochdays((participant_details.periods)); - Heikki
pgsql-performance by date: