On Sat, 2005-10-01 at 10:57 -0500, Jim C. Nasby wrote:
> To clarify, this is a hard-coded implementation of what I'm asking for:
> http://cvs.distributed.net/viewcvs.cgi/stats-sql/logdb/ in a nutshell:
>
> CREATE TABLE log_other (
> project_id smallint NOT NULL
> ...
> )
>
> CREATE TABLE log_8 (
> -- No project_id
> ...
> )
> CREATE TABLE log_24, log_25, log_5...
> CREATE VIEW log AS
> SELECT * FROM log_other
> UNION ALL SELECT 8 AS project_id, * FROM log_8
> ...
>
> So the end result is that for cases where project_id is 5, 8, 24, or 25,
> the data will be stored in tables that don't have the project_id.
>
> If I were to use this on the main table for
> http://stats.distributed.net, which has ~130M rows, I would be able to
> save 130M*4 bytes (4 instead of 2 due to alignment), or 520MB. The logdb
> will have many times that number of rows, so the savings will be even
> larger.
>
> Note that this technique wouldn't help at all for something like date
> partitioning, because you have to store the date in the partitioned
> table.
Jim,
Your idea was noted before and actually; I mentioned it to show that I
listen and take note of ideas from any source.
For everybody, I would note that the current behaviour is exactly the
way that List Partitioning works on other systems.
The cost of this technique is only paid if you choose to partition on
something that you would not otherwise have included in your table. In
many cases, you'll choose a column that would have been in the table if
you created one big table so the additional cost is zero.
In your example, I would expect to see project_id in a superclass table
and so there would be no cost.
The idea is neat, but IMHO the potential saving of this idea is not big
enough for me to prioritise that very highly over other items at this
time.
Best Regards, Simon Riggs