On Tue, 2009-06-30 at 10:22 +1000, Robert Edwards wrote:
> Would this be in addition to a unique constraint on (a, b, c) (for the
> cases where c is not null)?
That depends on your app's needs. Do you need to enforce uniqueness of
(a,b,c) ? Or of (a,b) where c is _not_ null? If so, then yes, that's in
addition to the existing unique index over a, b and c.
If you're happy to permit multiple identical (a,b,c) sets so long as c
is null, then there's no need for the additional index.
> In the real app. a and b are not null ints and c is a date. The date
> indicates if and when a row has expired (there are other columns in the
> table). I am trying to avoid having separate columns for the "if" and
> the "when" of the expiry.
Sounds fairly sensible to me, though I have the unsubstantiated feeling
that sometimes storing a boolean + date can help the database optimise
queries better.
If you do store a boolean and a date, add a check constraint to ensure
sanity:
CHECK ( is_expired = (expired_date IS NOT NULL) )
since you don't want a valid expired_date for an unexpired row, or a row
that expired at an unknown time.
> One alternate would be to use a date way off into the future (such as
> the famous 9/9/99 case many COBOL programmers used back in the 60's...)
> and to test on expired < now ().
Ick. Ick. Ick.
> Another option is to use a separate shadow table for the expired rows
> and to use a trigger function to "move" expired rows to that shadow
> table. Then need to use UNION etc. when I need to search across both
> current and expired rows.
That can work. It might be worth using inherited tables so you have a
single parent table that has rows from both history and live tables,
though.
Often a partial index on the main table works just as well, though.
That's another benefit to adding the index on (a,b) where c is null:
it's a smaller index that takes up less memory and can be scanned much
faster.
Unlike partitioning the data into separate tables, though, you'll still
have a mix of expired and unexpired rows in the table heap. The data of
interest will be scattered across more blocks, so the index will be
bigger and more data will have to be read in to satisfy a `where c is
not null' constrained query. So a partial index isn't _as_ good as
partitioning the data - but it's quite a bit easier.
--
Craig Ringer