Re: effective SELECT from child tables - Mailing list pgsql-hackers

From Jim C. Nasby
Subject Re: effective SELECT from child tables
Date
Msg-id 20051004142940.GG40138@pervasive.com
Whole thread Raw
In response to Re: effective SELECT from child tables  (Simon Riggs <simon@2ndquadrant.com>)
List pgsql-hackers
On Sat, Oct 01, 2005 at 07:59:11PM +0100, Simon Riggs wrote:
> On Sat, 2005-10-01 at 10:57 -0500, Jim C. Nasby wrote:
> > To clarify, this is a hard-coded implementation of what I'm asking for:
> > http://cvs.distributed.net/viewcvs.cgi/stats-sql/logdb/ in a nutshell:
> > 
> > CREATE TABLE log_other (
> >     project_id  smallint NOT NULL
> >     ...
> > )
> > 
> > CREATE TABLE log_8 (
> >     -- No project_id
> >     ...
> > )
> > CREATE TABLE log_24, log_25, log_5...
> > CREATE VIEW log AS
> >     SELECT * FROM log_other
> >     UNION ALL SELECT 8 AS project_id, * FROM log_8
> >     ...
> > 
> > So the end result is that for cases where project_id is 5, 8, 24, or 25,
> > the data will be stored in tables that don't have the project_id.
> > 
> > If I were to use this on the main table for
> > http://stats.distributed.net, which has ~130M rows, I would be able to
> > save 130M*4 bytes (4 instead of 2 due to alignment), or 520MB. The logdb
> > will have many times that number of rows, so the savings will be even
> > larger.
> > 
> > Note that this technique wouldn't help at all for something like date
> > partitioning, because you have to store the date in the partitioned
> > table.
> 
> Jim,
> 
> Your idea was noted before and actually; I mentioned it to show that I
> listen and take note of ideas from any source.
> 
> For everybody, I would note that the current behaviour is exactly the
> way that List Partitioning works on other systems.
> 
> The cost of this technique is only paid if you choose to partition on
> something that you would not otherwise have included in your table. In
> many cases, you'll choose a column that would have been in the table if
> you created one big table so the additional cost is zero.

Well, the idea is to be more space efficient than if one big table was
used. This is unique to this class of partitioning problems.

> In your example, I would expect to see project_id in a superclass table
> and so there would be no cost.

Superclass table?

> The idea is neat, but IMHO the potential saving of this idea is not big
> enough for me to prioritise that very highly over other items at this
> time.

Certainly. I only chimed in with a specific example so people could
better understand what the idea was. I know it's on the list and might
be addressed at some point. In the mean time it's not too horrible to
hard-code a solution.
-- 
Jim C. Nasby, Sr. Engineering Consultant      jnasby@pervasive.com
Pervasive Software      http://pervasive.com    work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf       cell: 512-569-9461


pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: Vacuum and Transactions
Next
From: Martijn van Oosterhout
Date:
Subject: Re: [PERFORM] A Better External Sort?