Re: date - range - Mailing list pgsql-performance

From Mischa
Subject Re: date - range
Date
Msg-id 1112421584.424e34d0d2d63@webmail.telus.net
Whole thread Raw
In response to date - range  ("H.J. Sanders" <hjs@rmax.nl>)
Responses Re: date - range  (Mischa <mischa.Sandberg@telus.net>)
Re: date - range  (Michael Fuhr <mike@fuhr.org>)
List pgsql-performance
Quoting "H.J. Sanders" <hjs@rmax.nl>:

>
> Anybody a solution for the next problem:
> people can subscribe to a service for 1 or more days (upto a max. of 365).
> So in the database is stored: first_date and last_date
> To select which people are subscribed for a certain date (e.g. today) we use
> a select like
>
> select   ....... where first_date <= today and last_date >= today
>
> Whatever index we create system always does a sequential scan (which I can
> understand). Has someone a smarter solution?

Yep, standard SQL problem. The answer is sort of a hand-rolled GIST index.

To save typing, I'm going to pretend all your dates are stored as integers.
In reality, you'll probably be writing views with lots of EXTRACT(EPOCH...)'s in
them, to achieve the same result.

Suppose you have table People(id, first_date, last_date, ...)
Each such range "fits" in some larger fixed range of 1,2,4, ... days
that starts and ends on a fixed (epoch) date multiple of 1,2,4,...
For example, if your range were days (1040..1080), then that fits in the
64-wide range (1024...1088]. You calculate the start and width of the range that
just fits, and store that in People, too. Now, you index on (start,width).

Now, when you want to query for a given "today", you have to try for
all possible widths in People. Fortunately, that's darn few!
The ranges up to a decade (!) will still mean only 16 different widths.
A max range of one year (<512 days) means only 9 widths.
You can do this with a tiny static table.

Then: the query:

SELECT  People.* FROM People
JOIN Widths
ON    People.start = today - today % Widths.width
AND   People.width = Widths.width

Though this may look gross, it makes an index work where no normal BTree index
would. I've used it for some really nasty data conversions of 100M-row tables.

Your first name wouldn't be "Harlan", would it? :-)
-- "Dreams come true, not free."


pgsql-performance by date:

Previous
From: Michael Fuhr
Date:
Subject: Re: date - range
Next
From: Mischa
Date:
Subject: Re: date - range