Re: Sponsoring enterprise features - Mailing list pgsql-hackers

From Greg Stark
Subject Re: Sponsoring enterprise features
Date
Msg-id 871xs02gaf.fsf@stark.dyndns.tv
Whole thread Raw
In response to Re: Sponsoring enterprise features  (Josh Berkus <josh@agliodbs.com>)
List pgsql-hackers
Josh Berkus <josh@agliodbs.com> writes:

> I'm a little unclear, personally, about what can be accomplished through table 
> partitioning that we can't currently do through partial indexes and inherited 
> tables, especially after Gavin finishes his tablespaces patch (btw, Gavin 
> could use sponsorship on that one, I think).  Can you make your case to 
> me/the list?   So far, the only arguments we've gotten on this list have been 
> of the "Oracle does it that way" variety so it'd be interesting to see 
> something concrete.

Well probably everyone who wants it is saying things of the form "they were a
useful with Oracle because...". Which isn't the same thing as "Oracle does it
this way". I don't particularly care how partitioned tables are *implemented*,
only the net effect. You can think of them as an abstraction over inherited
tables that let the database guarantee your data integrity and offer query
optimizations in a way it cannot if you build it by hand.  

I know for us they were an absolute godsend. The main advantages over a single
monolithic table even with partial indexes are:

1) Being able to load and unload parts of the table quickly.

  Adding and removing a partition is basically a DDL operation, not DML. It  doesn't have to visit every tuple and mark
itdeleted or added. It just has  to add or remove the entire partition to the structure.
 
  Partitioned tables are frequently used for aging out old data. The common  example is of having a partition per month
andkeeping 3-12 months of data.  We had a more extreme case where we had one partition per day and kept 21  days of
data.    When we implemented partitioned tables the time to archive and delete the  old data went from taking most of
thenight and killing production  performance to effectively instantaneous and we were able to run it at peak  time.
 

2) Being able to do a sequential scan of a partition. 
  Sequential scans are faster than index scans. Sometimes much faster.  Partial indexes are nice but when they cover
10-20%of your table scanning  them is much slower than a sequential scan of a partition.
 


As for inherited tables. Well, I would expect a partitioned tables scheme to
be implemented using inherited tables or just using views. You could jury-rig
it today using these tools, it would just be very awkward and fragile. The
original Oracle implementation in Oracle 7 was implemented much the same way
using views. They were a complete hack and required lots of manual tweaking
though.


The point of partitioned tables is

a) The database ensures tuples go into the correct partition. If you used a  manually constructed view or inherited
tablesyou would always run the risk  of inserting into the wrong partition which would break your data  integrity.
 

b) The database automatically optimizes queries to query the correct  partitions. It detects clauses in the query much
likepartial indexes so  you don't have to tweak every query by hand and the database can skip  clauses that match the
partitionclause exactly. Also this is a prime  opportunity for the database to introduce parallel queries because each
partitioncan be accessed independently.
 


-- 
greg



pgsql-hackers by date:

Previous
From: Rod Taylor
Date:
Subject: Re: Anyone working on pg_dump dependency ordering?
Next
From: Andreas Pflug
Date:
Subject: Re: Anyone working on pg_dump dependency ordering?