Re: Optimize date query for large child tables: GiST or GIN? - Mailing list pgsql-performance

From Thom Brown
Subject Re: Optimize date query for large child tables: GiST or GIN?
Date
Msg-id AANLkTim7O3Aalp31WKwBuTufYXeUMQWkwy51nvRcI0QY@mail.gmail.com
Whole thread Raw
In response to Re: Optimize date query for large child tables: GiST or GIN?  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: Optimize date query for large child tables: GiST or GIN?
List pgsql-performance
On 20 May 2010 20:02, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Thom Brown <thombrown@gmail.com> writes:
>> On 20 May 2010 17:36, David Jarvis <thangalin@gmail.com> wrote:
>> Okay, get your app to convert the month-date to a day of year, so we
>> have year_start, year_end, day_of_year_start, day_of_year_end
>
>> and your where clause would something like this:
>
>> WHERE extract(YEAR from m.taken) BETWEEN year1 and year2
>> AND (
>>       extract(DOY from m.taken) BETWEEN day_of_year_start AND day_of_year_end
>>       OR (
>>               extract(DOY from m.taken) >= day_of_year_start OR extract(DOY from
>> m.taken) <= day_of_year_end
>>       )
>> )
>
> extract(DOY) seems a bit problematic here, because its day numbering is
> going to be different between leap years and non-leap years, and David's
> problem statement doesn't allow for off-by-one errors.  You could
> certainly invent your own function that worked similarly but always
> translated a given month/day to the same number.
>
> The other thing that's messy here is the wraparound requirement.
> Rather than trying an OR like the above (which I think doesn't quite
> work anyway --- won't it select everything?)

No.  It only would if using BETWEEN SYMMETRIC.

Like if m.taken is '2003-02-03', using a start day of year as 11th Nov
and end as 17th Feb, it would match the 2nd part of the outer OR
expression.  If you changed the end day of year to 2nd Feb, it would
yield no result as nothing is between 11th Nov and 17th Feb as it's a
negative difference, and 2nd Feb is lower than the taken date so fails
to match the first half of the inner most OR expression.

> , it would be better if
> you can have the app distinguish wraparound from non-wraparound cases
> and issue different queries in the two cases.  In the non-wrap case
> (start_day < end_day) it's pretty easy, just
>        my_doy(m.taken) BETWEEN start_val AND end_val
> The easy way to handle the wrap case is
>        my_doy(m.taken) <= start_val OR my_doy(m.taken) >= end_val
> although I can't help feeling there should be a smarter way to do
> this where you can use an AND range check on some modified expression
> derived from the date.
>
>                        regards, tom lane
>

Yes, I guess I agree that the app can run different queries depending
on which date is higher.  I hadn't factored leap years into the
equation.  Can't think of what could be done for those cases off the
top of my head.  What is really needed is a way to match against day
and month parts instead of day, month and year.... without resorting
to casting to text of course.

Thom

pgsql-performance by date:

Previous
From: Tom Lane
Date:
Subject: Re: Optimize date query for large child tables: GiST or GIN?
Next
From: Tom Lane
Date:
Subject: Re: [BUGS] Query causing explosion of temp space with join involving partitioning