Re: Transparent table partitioning in future version of PG? - Mailing list pgsql-performance
From | david@lang.hm |
---|---|
Subject | Re: Transparent table partitioning in future version of PG? |
Date | |
Msg-id | alpine.DEB.1.10.0905081043340.15782@asgard Whole thread Raw |
In response to | Re: Transparent table partitioning in future version of PG? (Robert Haas <robertmhaas@gmail.com>) |
Responses |
Re: Transparent table partitioning in future version of
PG?
Re: Transparent table partitioning in future version of PG? |
List | pgsql-performance |
On Fri, 8 May 2009, Robert Haas wrote: > On Thu, May 7, 2009 at 10:52 PM, <david@lang.hm> wrote: >>>> Hopefully, notions of partitioning won't be directly tied to chunking of >>>> data for parallel query access. Most queries access recent data and >>>> hence only a single partition (or stripe), so partitioning and >>>> parallelism and frequently exactly orthogonal. >>> >>> Yes, I think those things are unrelated. >> >> I'm not so sure (warning, I am relativly inexperianced in this area) >> >> it sounds like you can take two basic approaches to partition a database >> >> 1. The Isolation Plan > [...] >> 2. The Load Balancing Plan > > Well, even if the table is not partitioned at all, I don't see that it > should preclude parallel query access. If I've got a 1 GB table that > needs to be sequentially scanned for rows meeting some restriction > clause, and I have two CPUs and plenty of I/O bandwidth, ISTM it > should be possible to have them each scan half of the table and > combine the results. Now, this is not easy and there are probably > substantial planner and executor changes required to make it work, but > I don't know that it would be particularly easier if I had two 500 MB > partitions instead of a single 1 GB table. > > IOW, I don't think you should need to partition if all you want is > load balancing. Partitioning should be for isolation, and load > balancing should happen when appropriate, whether there is > partitioning involved or not. actually, I will contridict myself slightly. with the Isolation Plan there is not nessasarily a need to run the query on each parition in parallel. if parallel queries are possible, it will benifit Isolation Plan paritioning, but the biggest win with this plan is just reducing the number of paritions that need to be queried. with the Load Balancing Plan there is no benifit in partitioning unless you have the ability to run queries on each parition in parallel using a seperate back-end process to do a query on a seperate partition is a fairly straightforward, but not trivial thing to do (there are complications in merging the result sets, including the need to be able to do part of a query, merge the results, then use those results for the next step in the query) I would also note that there does not seem to be a huge conceptual difference between doing these parallel queries on one computer and shipping the queries off to other computers. however, trying to split the work on a single table runs into all sorts of 'interesting' issues with things needing to be shared between the multiple processes (they both need to use the same indexes, for example) so I think that it is much easier for the database engine to efficiantly search two 500G tables instead of one 1T table. David Lang
pgsql-performance by date: