Home > mailing lists

Re: Partitioned table performance - Mailing list pgsql-performance

From	Stacy White
Subject	Re: Partitioned table performance
Date	December 15, 2004 06:08:20
Msg-id	001f01c4e26c$99719680$0200a8c0@grownups Whole thread
In response to	Partitioned table performance ("Stacy White" <harsh@computer.org>)
Responses	Re: Partitioned table performance
List	pgsql-performance

Tree view

Josh,

You're absolutely correct that the overhead becomes less significant as the
partitioning prunes more rows.  I can even see a two-partition table being
useful in some situations (e.g., a table divided into a relatively small
"recent data" partition and a much larger "historical data" partition).  The
break-even point is when your partitioning scheme prunes 20% of the rows
(assuming you're using the inheritance based scheme).

Thanks again for the reply.  So it sounds like the answer to my original
question is that it's expected that the pseudo-partitioning would introduce
a fairly significant amount of overhead.  Correct?



----- Original Message -----
From: "Josh Berkus" <josh@agliodbs.com>
To: <pgsql-performance@postgresql.org>
Cc: "Stacy White" <harsh@computer.org>
Sent: Friday, December 10, 2004 9:52 PM
Subject: Re: [PERFORM] Partitioned table performance


Stacy,

> Each set of test tables holds 1,000,000 tuples with a partition value of
> '1', and 1,000,000 with a partition value of '2'. The bar* columns are all
> set to non-null values. The 'one_big_foo' table stores all 2M rows in one
> table. 'super_foo' and 'union_foo' split the data into two tables, and use
> inheritance and union views (respectively) to tie them together, as
> described in my previous message.
>
> Query timings and 'EXPLAIN ANALYZE' results for full table scans and for
> partition scans follow:

Hmmm .... interesting.   I think you've demonstrated that
pseudo-partitioning
doesn't pay for having only 2 partitions.   Examine this:

-> Index Scan using idx_sub_foo2_partition on sub_foo2
super_foo (cost=0.00..2.01 rows=1 width=4) (actual time=0.221..0.221
rows=0 loops=1)
Index Cond: (partition = 1::numeric)
Total runtime: 15670.463 ms

As you see, even though the aggregate operation requires a seq scan, the
planner is still able to scan, and discard, sub_foo2, using its index in 0.2
seconds.  Unfortunately, super_foo still needs to contend with:

-> Append (cost=0.00..28376.79 rows=1000064 width=4) (actual
time=6.699..12072.483 rows=1000000 loops=1)

Right there, in the Append, you lose 6 seconds.   This means that
pseudo-partitioning via inheritance will become a speed gain once you can
"make up" that 6 seconds by being able to discard more partitions.   If you
want, do a test with 6 partitions instead of 2 and let us know how it comes
out.

Also, keep in mind that there are other reasons to do pseudo-partitioning
than
your example.  Data write performance, expiring partitions, and vacuum are
big reasons that can motivate partitioning even in cases when selects are
slower.

--
Josh Berkus
Aglio Database Solutions
San Francisco

---------------------------(end of broadcast)---------------------------
TIP 3: if posting/reading through Usenet, please send an appropriate
      subscribe-nomail command to majordomo@postgresql.org so that your
      message can get through to the mailing list cleanly

pgsql-performance by date:

From: Simon Riggs
Date: 14 December 2004, 23:15:40
Subject: Re: Speeding up pg_dump

From: sarlav kumar
Date: 15 December 2004, 14:38:28
Subject: \d output to a file

Re: Partitioned table performance - Mailing list pgsql-performance

Previous

Next