Re: Data warehousing requirements - Mailing list pgsql-performance

From Josh Berkus
Subject Re: Data warehousing requirements
Date
Msg-id 200410071550.20665.josh@agliodbs.com
Whole thread Raw
In response to Re: Data warehousing requirements  (Gabriele Bartolini <angusgb@tin.it>)
Responses Re: Data warehousing requirements  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-performance
Gabriele,

> That's another interesting argument. Again, I had in mind the space
> efficiency principle and I decided to use null IDs for dimension tables if
> I don't have the information. I noticed though that in those cases I can't
> use any index and performances result very poor.

For one thing, this is false optimization; a NULL isn't saving you any table
size on an INT or BIGINT column.    NULLs are only smaller on variable-width
columns.  If you're going to start counting bytes, make sure it's an informed
count.

More importantly, you should never, ever allow null FKs on a star-topology
database.    LEFT OUTER JOINs are vastly less efficient than INNER JOINs in a
query, and the difference between having 20 outer joins for your data view,
vs 20 regular joins, can easily be a difference of 100x in execution time.

--
--Josh

Josh Berkus
Aglio Database Solutions
San Francisco

pgsql-performance by date:

Previous
From: Bill Montgomery
Date:
Subject: Re: Excessive context switching on SMP Xeons
Next
From: "Aaron Werman"
Date:
Subject: Re: Data warehousing requirements