Partitioned table performance - Mailing list pgsql-performance

From Stacy White
Subject Partitioned table performance
Date
Msg-id 000a01c4da74$84e072e0$0200a8c0@grownups
Whole thread Raw
Responses Re: Partitioned table performance
List pgsql-performance
We're working with a Postgres database that includes a fairly large table
(100M rows, increasing at around 2M per day).

In some cases we've seen some increased performance in tests by splitting
the table into several smaller tables.  Both 'UNION ALL' views, and the
superclass/subclass scheme work well at pruning down the set of rows a query
uses, but they seem to introduce a large performance hit to the time to
process each row (~50% for superclass/subclass, and ~150% for union views).

Is this to be expected?  Or is this a problem with our test setup?

I've listed details on our tests at the end of this message.  The results
are similar with our larger tables; the overhead appears to be per record
returned from the subquery/subclass; it's not a constant overhead per query.
Our production instance is running 7.4.2, but the results are the same on
8.0.

For reference, I tested with this setup (for the superclass/subclass
partitioning scheme):

   CREATE TABLE super_foo ( partition NUMERIC, bar NUMERIC );
   ANALYZE super_foo ;

   CREATE TABLE sub_foo1 () INHERITS ( super_foo );
   INSERT INTO sub_foo1 VALUES ( 1, 1 );
   -- repeat insert until sub_foo1 has 1,000,000 rows
   CREATE INDEX idx_subfoo1_partition ON sub_foo1 ( partition );
   ANALYZE sub_foo1 ;

   CREATE TABLE sub_foo2 () INHERITS ( super_foo );
   INSERT INTO sub_foo2 VALUES ( 2, 1 );
   -- repeat insert until sub_foo2 has 1,000,000 rows
   CREATE INDEX idx_subfoo2_partition ON sub_foo2 ( partition );
   ANALYZE sub_foo2 ;

and this setup for the union all scheme:

   CREATE TABLE union_foo1 ( bar NUMERIC );
   INSERT INTO union_foo1 VALUES ( 1 ) ;
   -- repeat insert until union_foo1 has 1,000,000 rows
   ANALYZE union_foo1 ;

   CREATE TABLE union_foo2 ( bar NUMERIC );
   INSERT INTO union_foo2 VALUES ( 1 ) ;
   -- repeat insert until union_foo2 has 1,000,000 rows
   ANALYZE union_foo2 ;

   CREATE VIEW union_foo AS
      SELECT 1 AS partition, * FROM union_foo1
      UNION ALL
      SELECT 2 AS partition, * FROM union_foo2 ;

The partition pruning works marvelously:

  EXPLAIN SELECT SUM(bar) FROM super_foo WHERE partition = 2 ;
                                                   QUERY PLAN
 ---------------------------------------------------------------------------
----------------------------------
 Aggregate  (cost=21899.02..21899.02 rows=1 width=32)
   ->  Append  (cost=0.00..19399.01 rows=1000002 width=32)
         ->  Seq Scan on super_foo  (cost=0.00..0.00 rows=1 width=32)
               Filter: (partition = 2::numeric)
         ->  Index Scan using idx_subfoo1_partition on sub_foo1 super_foo
(cost=0.00..2.01 rows=1 width=10)
               Index Cond: (partition = 2::numeric)
         ->  Seq Scan on sub_foo2 super_foo  (cost=0.00..19397.00
rows=1000000 width=10)
               Filter: (partition = 2::numeric)

and

  EXPLAIN SELECT SUM(bar) FROM union_foo WHERE partition = 2 ;
                                             QUERY PLAN
 ---------------------------------------------------------------------------
----------------------
 Aggregate  (cost=75819.15..75819.15 rows=1 width=32)
   ->  Subquery Scan union_foo  (cost=0.00..70818.60 rows=2000220 width=32)
         ->  Append  (cost=0.00..50816.40 rows=2000220 width=10)
               ->  Subquery Scan "*SELECT* 1"  (cost=0.00..25408.20
rows=1000110 width=10)
                     ->  Result  (cost=0.00..15407.10 rows=1000110 width=10)
                           One-Time Filter: false
                           ->  Seq Scan on union_foo1  (cost=0.00..15407.10
rows=1000110 width=10)
               ->  Subquery Scan "*SELECT* 2"  (cost=0.00..25408.20
rows=1000110 width=10)
                     ->  Seq Scan on union_foo2  (cost=0.00..15407.10
rows=1000110 width=10)


But you can see a fair amount of overhead, espcially in the case of the
union view:

  SELECT SUM(bar) FROM sub_foo1 UNION ALL SELECT SUM(bar) FROM sub_foo2 ;
  Time: 2291.637 ms

  SELECT SUM(bar) FROM union_foo1 UNION ALL SELECT SUM(bar) FROM union_foo2
;
  Time: 2248.225 ms

  SELECT SUM(bar) FROM super_foo ;
  Time: 3329.953 ms

  SELECT SUM(bar) FROM union_foo ;
  Time: 5267.742 ms


  SELECT SUM(bar) FROM sub_foo2 ;
  Time: 1124.496 ms

  SELECT SUM(bar) FROM union_foo2 ;
  Time: 1090.616 ms

  SELECT SUM(bar) FROM super_foo WHERE partition = 2 ;
  Time: 2137.767 ms

  SELECT SUM(bar) FROM union_foo WHERE partition = 2 ;
  Time: 2774.887 ms


pgsql-performance by date:

Previous
From: Josh Berkus
Date:
Subject: Re: Improve BULK insertion
Next
From: Rodrigo Carvalhaes
Date:
Subject: Re: pg_restore taking 4 hours!