Thread: table partitioning and select max(id)

table partitioning and select max(id)

From

Tobias Brox

Date:

04 February 2011, 22:25:12

I implemented table partitioning, and it caused havoc with a "select
max(id)" on the parent table - the query plan has changed from a
lightningly fast backwards index scan to a deadly seq scan.  Both
partitions are set up with primary key index and draws new IDs from
the same sequence ... "select max(id)" on both partitions are fast.
Are there any tricks I can do to speed up this query?  I can't add the
ID to the table constraints, we may still get in "old" data causing
rows with fresh IDs to get into the old table.

(I decided to keep this short rather than include lots of details -
but at least worth mentioning that we're using PG9)

Re: table partitioning and select max(id)

From

Ken Cox

Date:

05 February 2011, 02:39:53

This is a known limitation of partitioning.  One solution is to use a
recursive stored proc, which can use indexes.  Such a solution is
discussed here:
http://archives.postgresql.org/pgsql-performance/2009-09/msg00036.php

Regards,
Ken

http://archives.postgresql.org/pgsql-performance/2009-09/msg00036.php

On Fri, Feb 4, 2011 at 6:24 PM, Tobias Brox <tobixen@gmail.com> wrote:
> I implemented table partitioning, and it caused havoc with a "select
> max(id)" on the parent table - the query plan has changed from a
> lightningly fast backwards index scan to a deadly seq scan.  Both
> partitions are set up with primary key index and draws new IDs from
> the same sequence ... "select max(id)" on both partitions are fast.
> Are there any tricks I can do to speed up this query?  I can't add the
> ID to the table constraints, we may still get in "old" data causing
> rows with fresh IDs to get into the old table.
>
> (I decided to keep this short rather than include lots of details -
> but at least worth mentioning that we're using PG9)
>
> --
> Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-performance
>



--
-Ken

Re: table partitioning and select max(id)

From

Greg Smith

Date:

05 February 2011, 04:03:20

Tobias Brox wrote:
> I implemented table partitioning, and it caused havoc with a "select
> max(id)" on the parent table - the query plan has changed from a
> lightningly fast backwards index scan to a deadly seq scan.

This problem was fixed in the upcoming 9.1:

http://archives.postgresql.org/pgsql-committers/2010-11/msg00028.php
http://git.postgresql.org/gitweb?p=postgresql.git;a=commitdiff;h=034967bdcbb0c7be61d0500955226e1234ec5f04

Here's the comment from that describing the main technique used to fix it:

"This module tries to replace MIN/MAX aggregate functions by subqueries
of the form

(SELECT col FROM tab WHERE ... ORDER BY col ASC/DESC LIMIT 1)

Given a suitable index on tab.col, this can be much faster than the
generic scan-all-the-rows aggregation plan.  We can handle multiple
MIN/MAX aggregates by generating multiple subqueries, and their
orderings can be different.  However, if the query contains any
non-optimizable aggregates, there's no point since we'll have to scan
all the rows anyway."

Unfortunately that change ends a series of 6 commits of optimizer
refactoring in this area, so it's not the case that you just apply this
one commit as a bug-fix to a 9.0 system.  I have a project in process to
do the full backport needed I might be able to share with you if that
works out, and you're willing to run with a customer patched server
process.  Using one of the user-space ideas Ken suggested may very well
be easier for you.  I'm stuck with an app I can't rewrite to do that.

--
Greg Smith   2ndQuadrant US    greg@2ndQuadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support  www.2ndQuadrant.us
"PostgreSQL 9.0 High Performance": http://www.2ndQuadrant.com/books

Re: table partitioning and select max(id)

From

Tobias Brox

Date:

05 February 2011, 07:16:38

[Greg Smith]
> Here's the comment from that describing the main technique used to fix it:
>
> "This module tries to replace MIN/MAX aggregate functions by subqueries of
> the form
>
> (SELECT col FROM tab WHERE ... ORDER BY col ASC/DESC LIMIT 1)

Huh ... that sounds a bit like pg 8.0 to me ;-)  I remember on 7.x one
had to write "select id from table order by id desc limit 1" to force
through a quick index scan.  This was fixed in 8.0 IIRC.  I did test
"select id from table order by id desc limit 1" on my parent table
yesterday, it would still do the seq-scan.  Even adding a
where-restriction to make sure only one partition was queried I still
got the seq-scan.

> Unfortunately that change ends a series of 6 commits of optimizer
> refactoring in this area, so it's not the case that you just apply this one
> commit as a bug-fix to a 9.0 system.  I have a project in process to do the
> full backport needed I might be able to share with you if that works out,
> and you're willing to run with a customer patched server process.

In this particular case, "wait for 9.1" seems to be the best option :-)

Re: table partitioning and select max(id)

From

Greg Smith

Date:

05 February 2011, 07:49:34

Tobias Brox wrote:
> I did test "select id from table order by id desc limit 1" on my parent table
> yesterday, it would still do the seq-scan.  Even adding a
> where-restriction to make sure only one partition was queried I still
> got the seq-scan.
>

Right; you actually have to direct the query toward the specific
partition by name, nothing run against the parent table will work.  The
new logic for 9.1 essentially splits the query into this alternate form,
runs it against every partition individually, then combines the
results.  If you can afford to wait for 9.1, that is certainly the easy
path here.  It just works out of the box in that version.

--
Greg Smith   2ndQuadrant US    greg@2ndQuadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support  www.2ndQuadrant.us
"PostgreSQL 9.0 High Performance": http://www.2ndQuadrant.com/books