Removing nulls with 6NF - Mailing list pgsql-performance

From Robert DiFalco
Subject Removing nulls with 6NF
Date
Msg-id CAAXGW-w=gbJp3eWbT1m7DGakW3Uent=fsj10nUghyZ1vCjiBHA@mail.gmail.com
Whole thread Raw
List pgsql-performance
I have a question about replacing NULL columns with tables. Say I have a table called Contacts that represents the Contacts (say from Facebook) of my Users. Right now Contacts has several columns and in addition to these it has an Owner (the user that owns this contact) and User. The User is NULL if there is no user in my system that matches the contact. The Owner can never be NULL. 

If I want to get all the contacts for a user and know which are and which are not themselves users I would do an OUTER join with the Users table on user_id (which may or may not be null). Or I may not want the user itself and not need to perform the OUTER join at all. I just want to see if they are NULL or a valid ID from the Users table.

So, if I endeavor to get rid of this kind of usage of NULL I could create another table called ContactUsers with two fields (ContactID, UserID).

I'd go from this:

    Contacts (ContactID, OwnerID, UserID, ...)

To this:

    Contacts(ContactID, OwnerID, ... )
    ContactUsers( ContactID, UserID)

So my questions are these:

    I assume my new queries of all contacts with or without correlated users would now change to all need a User column that has a SELECT query on ContactUsers for that column.

How would this perform relative to the approach that uses a field and NULL.

    I *think* updates will actually be faster because instead of updating I will simply insert into ContactUsers and not have to lookup the Contact row and update its UserID field. But I'm concerned about queries.

   I'm also concerned about when a User changes their details. Now I will have to DELETE from the ContactUsers table then INSERT if there are different correlations. Before I would just update the contact rows with the modified user to NULL and re-correlate.

Should I be concerned about the performance of this approach? In addition to getting rid of NULLs and approach like this would help me reduce the number of columns in some tables. But that SELECT for each column thing has me concerned since the queries are very fast now using a NULL field.


Thanks!

pgsql-performance by date:

Previous
From: Josh Berkus
Date:
Subject: Re: Time of query result delivery
Next
From:
Date:
Subject: PostgreSQL 9.3.2 Performance issues