Re: Partitioned table performance - Mailing list pgsql-performance
From | Stacy White |
---|---|
Subject | Re: Partitioned table performance |
Date | |
Msg-id | 001f01c4e26c$99719680$0200a8c0@grownups Whole thread Raw |
In response to | Partitioned table performance ("Stacy White" <harsh@computer.org>) |
Responses |
Re: Partitioned table performance
|
List | pgsql-performance |
Josh, You're absolutely correct that the overhead becomes less significant as the partitioning prunes more rows. I can even see a two-partition table being useful in some situations (e.g., a table divided into a relatively small "recent data" partition and a much larger "historical data" partition). The break-even point is when your partitioning scheme prunes 20% of the rows (assuming you're using the inheritance based scheme). Thanks again for the reply. So it sounds like the answer to my original question is that it's expected that the pseudo-partitioning would introduce a fairly significant amount of overhead. Correct? ----- Original Message ----- From: "Josh Berkus" <josh@agliodbs.com> To: <pgsql-performance@postgresql.org> Cc: "Stacy White" <harsh@computer.org> Sent: Friday, December 10, 2004 9:52 PM Subject: Re: [PERFORM] Partitioned table performance Stacy, > Each set of test tables holds 1,000,000 tuples with a partition value of > '1', and 1,000,000 with a partition value of '2'. The bar* columns are all > set to non-null values. The 'one_big_foo' table stores all 2M rows in one > table. 'super_foo' and 'union_foo' split the data into two tables, and use > inheritance and union views (respectively) to tie them together, as > described in my previous message. > > Query timings and 'EXPLAIN ANALYZE' results for full table scans and for > partition scans follow: Hmmm .... interesting. I think you've demonstrated that pseudo-partitioning doesn't pay for having only 2 partitions. Examine this: -> Index Scan using idx_sub_foo2_partition on sub_foo2 super_foo (cost=0.00..2.01 rows=1 width=4) (actual time=0.221..0.221 rows=0 loops=1) Index Cond: (partition = 1::numeric) Total runtime: 15670.463 ms As you see, even though the aggregate operation requires a seq scan, the planner is still able to scan, and discard, sub_foo2, using its index in 0.2 seconds. Unfortunately, super_foo still needs to contend with: -> Append (cost=0.00..28376.79 rows=1000064 width=4) (actual time=6.699..12072.483 rows=1000000 loops=1) Right there, in the Append, you lose 6 seconds. This means that pseudo-partitioning via inheritance will become a speed gain once you can "make up" that 6 seconds by being able to discard more partitions. If you want, do a test with 6 partitions instead of 2 and let us know how it comes out. Also, keep in mind that there are other reasons to do pseudo-partitioning than your example. Data write performance, expiring partitions, and vacuum are big reasons that can motivate partitioning even in cases when selects are slower. -- Josh Berkus Aglio Database Solutions San Francisco ---------------------------(end of broadcast)--------------------------- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to majordomo@postgresql.org so that your message can get through to the mailing list cleanly
pgsql-performance by date: