Re: Partitioned table performance - Mailing list pgsql-performance

From Jim C. Nasby
Subject Re: Partitioned table performance
Date
Msg-id 20041221231155.GV18180@decibel.org
Whole thread Raw
In response to Re: Partitioned table performance  (Josh Berkus <josh@agliodbs.com>)
List pgsql-performance
On Wed, Dec 15, 2004 at 11:56:40AM -0800, Josh Berkus wrote:
> Greg,
>
> > Well Oracle has lots of partitioning intelligence pushed up to the planner
> > to avoid overhead.
> >
> > If you have a query with something like "WHERE date = '2004-01-01'" and
> > date is your partition key (even if it's a range) then Oracle will figure
> > out which partition it will need at planning time.
>
> Hmmm ... well, we're looking at making a spec for Postgres Table Partitioning.
> Maybe you could help?

This is something I've been thinking about doing for
http://stats.distributed.net; is there a formal project for this
somewhere?

On a different note, has anyone looked at the savings you get by
ommitting the partition field from the child tables? ISTM that the
savings would be substantial for narrow tables. Of course that most
likely means doing a union view instead of inheritence, but I'm guessing
here. The table I'm thinking of partitioning is quite narrow (see
below), so I suspect that dropping project_id out would result in a
substantial savings (there's basically nothing that ever queries across
the whole table). With the data distribution, I suspect just breaking
project ID's 205, 5, and 25 into partitioned tables that didn't contain
project_id would save about 450M (4bytes * 95% * 130M).

(the table has ~130M rows)

   Table "public.email_contrib"
   Column   |  Type   | Modifiers
------------+---------+-----------
 project_id | integer | not null
 id         | integer | not null
 date       | date    | not null
 team_id    | integer |
 work_units | bigint  | not null
Indexes:
    "email_contrib_pkey" primary key, btree (project_id, id, date)
    "email_contrib__pk24" btree (id, date) WHERE (project_id = 24)
    "email_contrib__pk25" btree (id, date) WHERE (project_id = 25)
    "email_contrib__pk8" btree (id, date) WHERE (project_id = 8)
    "email_contrib__project_date" btree (project_id, date)
Foreign-key constraints:
    "fk_email_contrib__id" FOREIGN KEY (id) REFERENCES stats_participant(id) ON UPDATE CASCADE
    "fk_email_contrib__team_id" FOREIGN KEY (team_id) REFERENCES stats_team(team) ON UPDATE CASCADE

stats=# select * from pg_stats where tablename='email_contrib' and
attname='project_id';
 schemaname |   tablename   |  attname   | null_frac | avg_width | n_distinct | most_common_vals  |
most_common_freqs| histogram_bounds | correlation  

------------+---------------+------------+-----------+-----------+------------+-------------------+---------------------------------------------------------+------------------+-------------
  public     | email_contrib | project_id |         0 |         4 | 6 | {205,5,25,8,24,3} |
{0.461133,0.4455,0.0444333,0.0418667,0.0049,0.00216667}| |    0.703936 
--
Jim C. Nasby, Database Consultant               decibel@decibel.org
Give your computer some brain candy! www.distributed.net Team #1828

Windows: "Where do you want to go today?"
Linux: "Where do you want to go tomorrow?"
FreeBSD: "Are you guys coming, or what?"

pgsql-performance by date:

Previous
From: "Jim C. Nasby"
Date:
Subject: Re: Partitioned table performance
Next
From: "Stacy White"
Date:
Subject: Re: Partitioned table performance