Michal Taborsky said:
> Richard Huxton wrote:
> > Where you don't have a valid date to store you should use NULL. This
> > business of storing zeroes is a horrible MySQL design mistake.
>
> Well, yes and no. It certainly is a design mistake and introduces
> incosistency into the database, but after I was bitten several times by
> NULL values I'd go for solution like this any day. Let me explain.
>
> We had a table of messages, which was inserted to randomly and every few
> minutes we'd walk through the unprocessed messages and perform some work
> on them. I, trying to have the database as clean as possible, used this
> table definition (simplified):
>
> messages (
> id serial,
> data text,
> arrived timestamp default now,
> processed timestamp)
>
> So after the message arrived, it had the processed field set to null,
> which was perfectly consistent and represented what it realy was--an
> unknown value.
>
> We'd then simply SELECT * FROM messages WHERE processed IS NULL and were
> happy ever after, only to realise after the table had grown to few
> thousands rows, that the SELECT took ages, because the system had to
> perform seqscan. Aha! So we added an index on processed, because common
> sense told me, that as long as there are 100k rows and only 10 of them
> are NULL, the index would be very selective and therefore useful.
>
> I guess you know where it ends--the index is not used for IS [NOT] NULL
> expressions. The obvious workaround was to add DEFAULT value to
> "processed" in form of kind of anchor (we used '-infinity') and then do
> SELECT * FROM messages WHERE processed='-infinity'.
>
> Bingo! The thing went 100x faster. So we could choose to have
> standards-compliant, very clean database design OR the thing that does
> what it's supposed to do in reasonable time. And believe me, it's kind
> of difficult to explain to our logistics department that we could have
> done the thing to return results in milliseconds instead of 10 secs, but
> chose not to for sake of clean design.
>
> It'd be really nice if we didn't have to use such hacks, but hey, life's
> inperfect.
It'd probably be better design to not use the date as a flag. This issue
actually came up for me yesterday with an application that is now being ported
to Postgres. Previously a null "ship date" indicated that an item to be
shipped had not gone yet. I'm adding a flag, not just because of this issue
you describe, but it is also more intuitive for anyone looking at the data
who is unfamiliar with the business logic.
Best,
Jim Wilson