Thread: use null or 0 in foreign key column, to mean "no value"?
Hi, (Maybe my subject line should be: `is not distinct from` and indexes.) In Postgres 9.4, I’ve got a table of ‘items’ that references a table ‘colors’. Not all items have colors, so I created anullable column in items like: color_id bigint references colors There is also an index on color_id: create index on items (color_id); I thought this was the right way to do it, but now I’m not so sure... In application code, prepared statements want to say:`select * from items where color_id = ?` and that `?` might be a int or null, so that doesn’t work. I used `is not distinctfrom` instead of =, which has the right meaning, but now I notice it doesn’t use the index for queries that replace`=` with `is not distinct from`, and queries run much slower. Using `explain` confirms: it’s doing sequential scanswhere `=` was using index. So… is this bad DB design to use null to mean that an item has no color? Should I instead put a special row in `colors`,maybe with id = 0, to represent the “no color” value? Or is there some way to make an index work with nulls and`is not distinct from`? thank you, Rob
On June 26, 2015 11:59:05 PM Robert Nikander wrote: > Hi, > > (Maybe my subject line should be: `is not distinct from` and indexes.) > > In Postgres 9.4, I’ve got a table of ‘items’ that references a table > ‘colors’. Not all items have colors, so I created a nullable column in > items like: > > color_id bigint references colors > > There is also an index on color_id: > > create index on items (color_id); > > I thought this was the right way to do it, but now I’m not so sure... In > application code, prepared statements want to say: `select * from items > where color_id = ?` and that `?` might be a int or null, so that doesn’t > work. I used `is not distinct from` instead of =, which has the right > meaning, but now I notice it doesn’t use the index for queries that replace > `=` with `is not distinct from`, and queries run much slower. Using > `explain` confirms: it’s doing sequential scans where `=` was using index. I test for NULL in my application code and emit '... WHERE foo = ?' if the value is not NULL and '... WHERE foo IS NOT NULL' otherwise. ISTR that that actually uses indexes. > > So… is this bad DB design to use null to mean that an item has no color? > Should I instead put a special row in `colors`, maybe with id = 0, to > represent the “no color” value? Or is there some way to make an index work > with nulls and `is not distinct from`? > > thank you, > Rob
On Friday, June 26, 2015, Robert Nikander <rob.nikander@gmail.com> wrote:
I would also likely just make the text value the unique key and forget the surrogate integer key.
So… is this bad DB design to use null to mean that an item has no color? Should I instead put a special row in `colors`, maybe with id = 0, to represent the “no color” value? Or is there some way to make an index work with nulls and `is not distinct from`?
Not sure about getting is distinct to work with indexes but in this particular case I would add a "colorless" color to the table and make the column constraint not null.
David J.
On Fri, Jun 26, 2015 at 10:59 PM, Robert Nikander <rob.nikander@gmail.com> wrote:
-- Hi,
(Maybe my subject line should be: `is not distinct from` and indexes.)
In Postgres 9.4, I’ve got a table of ‘items’ that references a table ‘colors’. Not all items have colors, so I created a nullable column in items like:
color_id bigint references colors
There is also an index on color_id:
create index on items (color_id);
I thought this was the right way to do it, but now I’m not so sure... In application code, prepared statements want to say: `select * from items where color_id = ?` and that `?` might be a int or null, so that doesn’t work. I used `is not distinct from` instead of =, which has the right meaning, but now I notice it doesn’t use the index for queries that replace `=` with `is not distinct from`, and queries run much slower. Using `explain` confirms: it’s doing sequential scans where `=` was using index.
So… is this bad DB design to use null to mean that an item has no color? Should I instead put a special row in `colors`, maybe with id = 0, to represent the “no color” value? Or is there some way to make an index work with nulls and `is not distinct from`?
_My_ personal opinion is that NULL should _never_ be used to "indicate" anything at all, other than something like "?I don't have that information!" NULL should be anathema because, as you have seen, just complicates coding SQL queries.
What I have read (Joe Celko mainly) is that it is better to have "encoded special values" for "unusual" things. In your case, what does "no color" really mean? Does it mean "it has some color, but I don't know what it is" or is it more like a "transparent" substance, such as pure glass? Or is it "reflective" so that it is seems to have the color of what ever color of light is illuminating it? Maybe a perfect mirror of some sort. Are the "normal" color_id values always non-negative (0 or greater)? If so then I'd encode color_id something like -1 == "some unknown color", -2 == "transparent", -3 == "reflective", or some other meaning. One example that we actually have at work is in an "expiration date" on certain types of data files. Most of the time, this is just a normal date like 2017-01-20. But, sometimes, we want codes to indicate things such as "keep the last 3 copies, and expire previous copies" or "keep until it has not been referenced in 90 days" or "keep forever". We either have to have some "special" date values to indicate these latter non-date case, or we need yet another column for "type of expiration" which would be values for "use date", "permanent", "keep ??? days non-reference", "keep ??? copies" and so forth. We actually don't use an SQL date field, but a text field which we can test and, if necessary, convert to a date or to some "explanatory" text.
thank you,
Rob
Schrodinger's backup: The condition of any backup is unknown until a restore is attempted.
Yoda of Borg, we are. Futile, resistance is, yes. Assimilated, you will be.
My sister opened a computer store in Hawaii. She sells C shells down by the seashore.
My sister opened a computer store in Hawaii. She sells C shells down by the seashore.
If someone tell you that nothing is impossible:
Ask him to dribble a football.
He's about as useful as a wax frying pan.
10 to the 12th power microphones = 1 Megaphone
Maranatha! <><
John McKown
He's about as useful as a wax frying pan.
10 to the 12th power microphones = 1 Megaphone
Maranatha! <><
John McKown
> On 27 Jun 2015, at 5:59, Robert Nikander <rob.nikander@gmail.com> wrote: > In application code, prepared statements want to say: `select * from items where color_id = ?` and that `?` might be aint or null, so that doesn’t work. You could add another parameter to test which expression to 'activate' in the query, something like: select * from items where ('notnull' = ? and color_id = ?) or ('null' = ? and color_id is null); Of course, with those _positional_ query parameters that means you need to add the same value TWICE into the query. You wouldn'tneed to with _named_ query parameters, if those are available to you. Alban Hertroys -- If you can't see the forest for the trees, cut the trees and you'll find there is no forest.