Partitioned table performance - Mailing list pgsql-performance
From | Stacy White |
---|---|
Subject | Partitioned table performance |
Date | |
Msg-id | 000a01c4da74$84e072e0$0200a8c0@grownups Whole thread Raw |
Responses |
Re: Partitioned table performance
|
List | pgsql-performance |
We're working with a Postgres database that includes a fairly large table (100M rows, increasing at around 2M per day). In some cases we've seen some increased performance in tests by splitting the table into several smaller tables. Both 'UNION ALL' views, and the superclass/subclass scheme work well at pruning down the set of rows a query uses, but they seem to introduce a large performance hit to the time to process each row (~50% for superclass/subclass, and ~150% for union views). Is this to be expected? Or is this a problem with our test setup? I've listed details on our tests at the end of this message. The results are similar with our larger tables; the overhead appears to be per record returned from the subquery/subclass; it's not a constant overhead per query. Our production instance is running 7.4.2, but the results are the same on 8.0. For reference, I tested with this setup (for the superclass/subclass partitioning scheme): CREATE TABLE super_foo ( partition NUMERIC, bar NUMERIC ); ANALYZE super_foo ; CREATE TABLE sub_foo1 () INHERITS ( super_foo ); INSERT INTO sub_foo1 VALUES ( 1, 1 ); -- repeat insert until sub_foo1 has 1,000,000 rows CREATE INDEX idx_subfoo1_partition ON sub_foo1 ( partition ); ANALYZE sub_foo1 ; CREATE TABLE sub_foo2 () INHERITS ( super_foo ); INSERT INTO sub_foo2 VALUES ( 2, 1 ); -- repeat insert until sub_foo2 has 1,000,000 rows CREATE INDEX idx_subfoo2_partition ON sub_foo2 ( partition ); ANALYZE sub_foo2 ; and this setup for the union all scheme: CREATE TABLE union_foo1 ( bar NUMERIC ); INSERT INTO union_foo1 VALUES ( 1 ) ; -- repeat insert until union_foo1 has 1,000,000 rows ANALYZE union_foo1 ; CREATE TABLE union_foo2 ( bar NUMERIC ); INSERT INTO union_foo2 VALUES ( 1 ) ; -- repeat insert until union_foo2 has 1,000,000 rows ANALYZE union_foo2 ; CREATE VIEW union_foo AS SELECT 1 AS partition, * FROM union_foo1 UNION ALL SELECT 2 AS partition, * FROM union_foo2 ; The partition pruning works marvelously: EXPLAIN SELECT SUM(bar) FROM super_foo WHERE partition = 2 ; QUERY PLAN --------------------------------------------------------------------------- ---------------------------------- Aggregate (cost=21899.02..21899.02 rows=1 width=32) -> Append (cost=0.00..19399.01 rows=1000002 width=32) -> Seq Scan on super_foo (cost=0.00..0.00 rows=1 width=32) Filter: (partition = 2::numeric) -> Index Scan using idx_subfoo1_partition on sub_foo1 super_foo (cost=0.00..2.01 rows=1 width=10) Index Cond: (partition = 2::numeric) -> Seq Scan on sub_foo2 super_foo (cost=0.00..19397.00 rows=1000000 width=10) Filter: (partition = 2::numeric) and EXPLAIN SELECT SUM(bar) FROM union_foo WHERE partition = 2 ; QUERY PLAN --------------------------------------------------------------------------- ---------------------- Aggregate (cost=75819.15..75819.15 rows=1 width=32) -> Subquery Scan union_foo (cost=0.00..70818.60 rows=2000220 width=32) -> Append (cost=0.00..50816.40 rows=2000220 width=10) -> Subquery Scan "*SELECT* 1" (cost=0.00..25408.20 rows=1000110 width=10) -> Result (cost=0.00..15407.10 rows=1000110 width=10) One-Time Filter: false -> Seq Scan on union_foo1 (cost=0.00..15407.10 rows=1000110 width=10) -> Subquery Scan "*SELECT* 2" (cost=0.00..25408.20 rows=1000110 width=10) -> Seq Scan on union_foo2 (cost=0.00..15407.10 rows=1000110 width=10) But you can see a fair amount of overhead, espcially in the case of the union view: SELECT SUM(bar) FROM sub_foo1 UNION ALL SELECT SUM(bar) FROM sub_foo2 ; Time: 2291.637 ms SELECT SUM(bar) FROM union_foo1 UNION ALL SELECT SUM(bar) FROM union_foo2 ; Time: 2248.225 ms SELECT SUM(bar) FROM super_foo ; Time: 3329.953 ms SELECT SUM(bar) FROM union_foo ; Time: 5267.742 ms SELECT SUM(bar) FROM sub_foo2 ; Time: 1124.496 ms SELECT SUM(bar) FROM union_foo2 ; Time: 1090.616 ms SELECT SUM(bar) FROM super_foo WHERE partition = 2 ; Time: 2137.767 ms SELECT SUM(bar) FROM union_foo WHERE partition = 2 ; Time: 2774.887 ms
pgsql-performance by date: