Thread: subselect requires offset 0 for good performance.

subselect requires offset 0 for good performance.

From
Scott Marlowe
Date:
Howdy. I seem to have inherited this problem:

http://www.postgresql.org/message-id/EF524E81-B815-4122-A337-7E50BCED514E@silentmedia.com

Basically a subselect with no offset is resulting in really poor
performance with 120s queries but adding an offset 0 to the inner sub
select results in 0.5s query times, and I get the same output.

The original answer Robert Haas asks for a self contained test case.

I am running 8.4.15 and can try 8.4.17 if some patch has been applied
to it to address this issue. I just want to know should I

A: upgrade to 8.4.17
or
B: create a self contained test case.

--
To understand recursion, one must first understand recursion.


Re: subselect requires offset 0 for good performance.

From
Merlin Moncure
Date:
On Thu, Aug 1, 2013 at 2:40 PM, Scott Marlowe <scott.marlowe@gmail.com> wrote:
> Howdy. I seem to have inherited this problem:
>
> http://www.postgresql.org/message-id/EF524E81-B815-4122-A337-7E50BCED514E@silentmedia.com
>
> Basically a subselect with no offset is resulting in really poor
> performance with 120s queries but adding an offset 0 to the inner sub
> select results in 0.5s query times, and I get the same output.
>
> The original answer Robert Haas asks for a self contained test case.
>
> I am running 8.4.15 and can try 8.4.17 if some patch has been applied
> to it to address this issue. I just want to know should I
>
> A: upgrade to 8.4.17
> or
> B: create a self contained test case.

IMNSHO, I would pursue both (unless A solves your problem in which
case B is moot).

merlin


Re: subselect requires offset 0 for good performance.

From
Tom Lane
Date:
Scott Marlowe <scott.marlowe@gmail.com> writes:
> I am running 8.4.15 and can try 8.4.17 if some patch has been applied
> to it to address this issue. I just want to know should I

> A: upgrade to 8.4.17
> or
> B: create a self contained test case.

A quick look at the release notes shows no planner fixes in 8.4.16 or
8.4.17, so it would be rather surprising if (A) helps.

            regards, tom lane


Re: subselect requires offset 0 for good performance.

From
Scott Marlowe
Date:
On Thu, Aug 1, 2013 at 5:44 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Scott Marlowe <scott.marlowe@gmail.com> writes:
>> I am running 8.4.15 and can try 8.4.17 if some patch has been applied
>> to it to address this issue. I just want to know should I
>
>> A: upgrade to 8.4.17
>> or
>> B: create a self contained test case.
>
> A quick look at the release notes shows no planner fixes in 8.4.16 or
> 8.4.17, so it would be rather surprising if (A) helps.

OK. I was doing some initial testing and if I select out the 4 columns
into a test table the query runs fast. If I select all the columns
into a test table it runs slow, so it appears table width affects
this. Will have more to report tomorrow on it.


Re: subselect requires offset 0 for good performance.

From
Vik Fearing
Date:
On 08/02/2013 03:22 AM, Scott Marlowe wrote:
> On Thu, Aug 1, 2013 at 5:44 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Scott Marlowe <scott.marlowe@gmail.com> writes:
>>> I am running 8.4.15 and can try 8.4.17 if some patch has been applied
>>> to it to address this issue. I just want to know should I
>>> A: upgrade to 8.4.17
>>> or
>>> B: create a self contained test case.
>> A quick look at the release notes shows no planner fixes in 8.4.16 or
>> 8.4.17, so it would be rather surprising if (A) helps.
> OK. I was doing some initial testing and if I select out the 4 columns
> into a test table the query runs fast. If I select all the columns
> into a test table it runs slow, so it appears table width affects
> this. Will have more to report tomorrow on it.

I don't know what your query is, but here's one I was working on
yesterday that shows the problem.  It may not be the smallest test case
possible, but it works.

EXPLAIN ANALYZE
WITH RECURSIVE
x (start_time) AS
(
    SELECT generate_series(1, 1000000)
),
t (time, timeround) AS
(
    SELECT time, time - time % 900000 AS timeround
    FROM (SELECT min(start_time) AS time FROM x) AS tmp
    UNION ALL
    SELECT time, time - time % 900000
    FROM (SELECT (SELECT min(start_time) AS time
                  FROM x
                  WHERE start_time >= t.timeround + 900000)
          FROM t
          WHERE t.time IS NOT NULL OFFSET 0
         ) tmp
)
SELECT count(*) FROM t WHERE time IS NOT NULL;

If you remove the OFFSET 0, you'll see two more subplans (because "time"
is referenced three times).  The difference is much more noticeable if
you make the x CTE its own table.

Vik

PS: This query is emulating a LooseIndexScan.
http://wiki.postgresql.org/wiki/Loose_indexscan


Re: subselect requires offset 0 for good performance.

From
Scott Marlowe
Date:
On Thu, Aug 1, 2013 at 7:22 PM, Scott Marlowe <scott.marlowe@gmail.com> wrote:
> On Thu, Aug 1, 2013 at 5:44 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Scott Marlowe <scott.marlowe@gmail.com> writes:
>>> I am running 8.4.15 and can try 8.4.17 if some patch has been applied
>>> to it to address this issue. I just want to know should I
>>
>>> A: upgrade to 8.4.17
>>> or
>>> B: create a self contained test case.
>>
>> A quick look at the release notes shows no planner fixes in 8.4.16 or
>> 8.4.17, so it would be rather surprising if (A) helps.
>
> OK. I was doing some initial testing and if I select out the 4 columns
> into a test table the query runs fast. If I select all the columns
> into a test table it runs slow, so it appears table width affects
> this. Will have more to report tomorrow on it.

Here's the query:
SELECT * FROM dba.pp_test_wide p LEFT JOIN
    (
    SELECT tree_sortkey FROM dba.pp_test_wide
    WHERE tree_sortkey BETWEEN '00000000000101010000010001010100'::VARBIT
        AND public.tree_right('00000000000101010000010001010100'::VARBIT)
        AND product_name IS NOT NULL AND tree_sortkey <>
'00000000000101010000010001010100'::VARBIT
    ) pp
ON p.tree_sortkey BETWEEN pp.tree_sortkey AND public.tree_right(pp.tree_sortkey)
WHERE
    p.tree_sortkey BETWEEN '00000000000101010000010001010100'::VARBIT
AND public.tree_right('00000000000101010000010001010100'::VARBIT)
    AND p.tree_sortkey BETWEEN '00000000'::VARBIT AND
public.tree_right('00000000'::VARBIT)
    AND p.deleted_at IS NULL
    AND pp.tree_sortkey IS NULL

I extracted all the data like so:

select * into dba.pp_test_wide from original table;

and get this query plan from explain analyze:
http://explain.depesz.com/s/EPx which takes 20 minutes to run.

If I extract it this way:

select tree_sortkey, product_name, deleted_at into db.pp_test_3col
from original table;

I get this plan: http://explain.depesz.com/s/gru which gets a
materialize in it, and suddenly takes 106 ms.

the factor in performance increase is therefore ~ 11,342. that's
pretty huge.  I'll try to make a self contained test case now.
Hopefully that at least points in the right direction tho to a bug of
some kind.


Re: subselect requires offset 0 for good performance.

From
Tom Lane
Date:
Scott Marlowe <scott.marlowe@gmail.com> writes:
> I extracted all the data like so:
> select * into dba.pp_test_wide from original table;
> and get this query plan from explain analyze:
> http://explain.depesz.com/s/EPx which takes 20 minutes to run.
> If I extract it this way:
> select tree_sortkey, product_name, deleted_at into db.pp_test_3col
> from original table;
> I get this plan: http://explain.depesz.com/s/gru which gets a
> materialize in it, and suddenly takes 106 ms.

There's no reason why suppressing some unrelated columns would change the
rowcount estimates, but those two plans show different rowcount estimates.

I suspect the *actual* reason for the plan change was that autovacuum had
had a chance to update statistics for the one table, and not yet for the
other.  Please do a manual ANALYZE on both tables and see if there's
still a plan difference.

            regards, tom lane


Re: subselect requires offset 0 for good performance.

From
Scott Marlowe
Date:
On Fri, Aug 2, 2013 at 1:31 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Scott Marlowe <scott.marlowe@gmail.com> writes:
>> I extracted all the data like so:
>> select * into dba.pp_test_wide from original table;
>> and get this query plan from explain analyze:
>> http://explain.depesz.com/s/EPx which takes 20 minutes to run.
>> If I extract it this way:
>> select tree_sortkey, product_name, deleted_at into db.pp_test_3col
>> from original table;
>> I get this plan: http://explain.depesz.com/s/gru which gets a
>> materialize in it, and suddenly takes 106 ms.
>
> There's no reason why suppressing some unrelated columns would change the
> rowcount estimates, but those two plans show different rowcount estimates.
>
> I suspect the *actual* reason for the plan change was that autovacuum had
> had a chance to update statistics for the one table, and not yet for the
> other.  Please do a manual ANALYZE on both tables and see if there's
> still a plan difference.

Interesting. I ran analyze on both tables and sure enough the new test
table runs fast. Ran analyze on the old table and it runs slow. The
only thing the old table and its plan are missing is the materialize.
So what is likely to change from the old table to the new one? Here's
the explain analyze output from the old table and the same query
against it: http://explain.depesz.com/s/CtZ and here's the plan with
offset 0 in it: http://explain.depesz.com/s/Gug note that while the
estimates are a bit off, the really huge difference here says to me
some suboptimal method is getting deployed in the background
somewhere. Do we need a stack trace?


Re: subselect requires offset 0 for good performance.

From
Scott Marlowe
Date:
On Fri, Aug 2, 2013 at 1:58 PM, Scott Marlowe <scott.marlowe@gmail.com> wrote:
> On Fri, Aug 2, 2013 at 1:31 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Scott Marlowe <scott.marlowe@gmail.com> writes:
>>> I extracted all the data like so:
>>> select * into dba.pp_test_wide from original table;
>>> and get this query plan from explain analyze:
>>> http://explain.depesz.com/s/EPx which takes 20 minutes to run.
>>> If I extract it this way:
>>> select tree_sortkey, product_name, deleted_at into db.pp_test_3col
>>> from original table;
>>> I get this plan: http://explain.depesz.com/s/gru which gets a
>>> materialize in it, and suddenly takes 106 ms.
>>
>> There's no reason why suppressing some unrelated columns would change the
>> rowcount estimates, but those two plans show different rowcount estimates.
>>
>> I suspect the *actual* reason for the plan change was that autovacuum had
>> had a chance to update statistics for the one table, and not yet for the
>> other.  Please do a manual ANALYZE on both tables and see if there's
>> still a plan difference.
>
> Interesting. I ran analyze on both tables and sure enough the new test
> table runs fast. Ran analyze on the old table and it runs slow. The
> only thing the old table and its plan are missing is the materialize.
> So what is likely to change from the old table to the new one? Here's
> the explain analyze output from the old table and the same query
> against it: http://explain.depesz.com/s/CtZ and here's the plan with
> offset 0 in it: http://explain.depesz.com/s/Gug note that while the
> estimates are a bit off, the really huge difference here says to me
> some suboptimal method is getting deployed in the background
> somewhere. Do we need a stack trace?

So as a followup. I ran vacuum verbose analyze on the original table,
thinking it might be bloated but it wasn't. Out of 320k or so rows
there were 4k dead tuples recovered, and none that it couldn't
recover.  So now I'm trying to recreate the original table with a
select into with an order by random() on the end. Nope it gets a
materialize in it and runs fast. Well it's danged hard to make a test
case when copying the table with random ordering results in a much
faster query against the same data. I'm at a loss on how to reproduce
this. Are the indexes on the master table leading it astray maybe?
Yep. Added the indexes and performance went right into the dumper. New
plan on new table with old data added in random order now looks like
the old table, only worse because it's on a slower drive. Just to be
complete here's the plan: http://explain.depesz.com/s/PYH Note that I
created new table with order by random() and created indexes. Ran
analyze on it, and the select plan looks similar now:
http://explain.depesz.com/s/bsE

So maybe I can make a test case now. But to summarize, when it can use
indexes this query gets REAL slow because it lacks a materialize step.
That seem about right?


Re: subselect requires offset 0 for good performance.

From
Tom Lane
Date:
Scott Marlowe <scott.marlowe@gmail.com> writes:
> Yep. Added the indexes and performance went right into the dumper. New
> plan on new table with old data added in random order now looks like
> the old table, only worse because it's on a slower drive. Just to be
> complete here's the plan: http://explain.depesz.com/s/PYH Note that I
> created new table with order by random() and created indexes. Ran
> analyze on it, and the select plan looks similar now:
> http://explain.depesz.com/s/bsE

> So maybe I can make a test case now. But to summarize, when it can use
> indexes this query gets REAL slow because it lacks a materialize step.
> That seem about right?

Well, the plans shown here could *not* use a materialize step because the
inner scan makes use of a value from the current outer row.  The
materialized plan has to omit one of the index conditions from the inner
scan and then apply it as a join condition.

I suspect the real reason that the fast case is fast is that the inner
relation, even without the p.tree_sortkey >= pro_partners.tree_sortkey
condition, is empty, and thus the join runs very quickly.  But the planner
doesn't know that.  Its estimate of the row count isn't very large, but
it's definitely not zero, plus it thinks that adding the additional index
condition reduces the rowcount by a factor of 3 from there.  So it comes
to the wrong conclusion about the value of materializing a fixed inner
relation as opposed to using a parameterized indexscan.

Have you tried increasing the statistics targets for these join columns?
It's also possible that what you need to do is adjust the planner's
cost parameters ...

            regards, tom lane


Re: subselect requires offset 0 for good performance.

From
Scott Marlowe
Date:
On Fri, Aug 2, 2013 at 2:51 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Scott Marlowe <scott.marlowe@gmail.com> writes:
>> Yep. Added the indexes and performance went right into the dumper. New
>> plan on new table with old data added in random order now looks like
>> the old table, only worse because it's on a slower drive. Just to be
>> complete here's the plan: http://explain.depesz.com/s/PYH Note that I
>> created new table with order by random() and created indexes. Ran
>> analyze on it, and the select plan looks similar now:
>> http://explain.depesz.com/s/bsE
>
>> So maybe I can make a test case now. But to summarize, when it can use
>> indexes this query gets REAL slow because it lacks a materialize step.
>> That seem about right?
>
> Well, the plans shown here could *not* use a materialize step because the
> inner scan makes use of a value from the current outer row.  The
> materialized plan has to omit one of the index conditions from the inner
> scan and then apply it as a join condition.
>
> I suspect the real reason that the fast case is fast is that the inner
> relation, even without the p.tree_sortkey >= pro_partners.tree_sortkey
> condition, is empty, and thus the join runs very quickly.  But the planner
> doesn't know that.  Its estimate of the row count isn't very large, but
> it's definitely not zero, plus it thinks that adding the additional index
> condition reduces the rowcount by a factor of 3 from there.  So it comes
> to the wrong conclusion about the value of materializing a fixed inner
> relation as opposed to using a parameterized indexscan.
>
> Have you tried increasing the statistics targets for these join columns?
> It's also possible that what you need to do is adjust the planner's
> cost parameters ...

I've tried changing random_page_cost, sequential_page_cost, the cpu*
costs, and setting effective_cache_size all over the place and it
stays just as slow.

our default stats target is 100. Did a stats target = 1000 on the
three cols we access. Same terrible performance. Plan here:
http://explain.depesz.com/s/XVt
stats target=10000, same bad performance, plan:
http://explain.depesz.com/s/kJ54 pretty much the same.  Setting
effective_cache_size='1000GB' make no difference, still slow.

If I set random_page_cost to 75 makes it work, i.e. a materialize
shows up. Note that we run on FusionIO cards, and the whole db fits in
memory, so a very large effective cache size and random page cost of
1.0 is actually accurate for our hardware.


Re: subselect requires offset 0 for good performance.

From
Scott Marlowe
Date:
On Fri, Aug 2, 2013 at 3:27 PM, Scott Marlowe <scott.marlowe@gmail.com> wrote:
> On Fri, Aug 2, 2013 at 2:51 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Scott Marlowe <scott.marlowe@gmail.com> writes:
>>> Yep. Added the indexes and performance went right into the dumper. New
>>> plan on new table with old data added in random order now looks like
>>> the old table, only worse because it's on a slower drive. Just to be
>>> complete here's the plan: http://explain.depesz.com/s/PYH Note that I
>>> created new table with order by random() and created indexes. Ran
>>> analyze on it, and the select plan looks similar now:
>>> http://explain.depesz.com/s/bsE
>>
>>> So maybe I can make a test case now. But to summarize, when it can use
>>> indexes this query gets REAL slow because it lacks a materialize step.
>>> That seem about right?
>>
>> Well, the plans shown here could *not* use a materialize step because the
>> inner scan makes use of a value from the current outer row.  The
>> materialized plan has to omit one of the index conditions from the inner
>> scan and then apply it as a join condition.
>>
>> I suspect the real reason that the fast case is fast is that the inner
>> relation, even without the p.tree_sortkey >= pro_partners.tree_sortkey
>> condition, is empty, and thus the join runs very quickly.  But the planner
>> doesn't know that.  Its estimate of the row count isn't very large, but
>> it's definitely not zero, plus it thinks that adding the additional index
>> condition reduces the rowcount by a factor of 3 from there.  So it comes
>> to the wrong conclusion about the value of materializing a fixed inner
>> relation as opposed to using a parameterized indexscan.
>>
>> Have you tried increasing the statistics targets for these join columns?
>> It's also possible that what you need to do is adjust the planner's
>> cost parameters ...
>
> I've tried changing random_page_cost, sequential_page_cost, the cpu*
> costs, and setting effective_cache_size all over the place and it
> stays just as slow.
>
> our default stats target is 100. Did a stats target = 1000 on the
> three cols we access. Same terrible performance. Plan here:
> http://explain.depesz.com/s/XVt
> stats target=10000, same bad performance, plan:
> http://explain.depesz.com/s/kJ54 pretty much the same.  Setting
> effective_cache_size='1000GB' make no difference, still slow.
>
> If I set random_page_cost to 75 makes it work, i.e. a materialize
> shows up. Note that we run on FusionIO cards, and the whole db fits in
> memory, so a very large effective cache size and random page cost of
> 1.0 is actually accurate for our hardware.

OK I've done some more research. Here's the plan for it with a low
random page cost:

Nested Loop Anti Join  (cost=0.00..1862107.69 rows=9274 width=994)
(actual time=0.181..183413.479 rows=17391 loops=1)
   Join Filter: (p.tree_sortkey <= tree_right(pp_test_wide.tree_sortkey))
   ->  Index Scan using pp_test_wide_tree_sortkey_idx on pp_test_wide
p  (cost=0.00..10108.15 rows=10433 width=983) (actual
time=0.164..218.230 rows=17391 loops=1)
         Index Cond: ((tree_sortkey >=
B'00000000000101010000010001010100'::bit varying) AND (tree_sortkey <=
B'0000000000010101000001000101010011111111111111111111111111111111'::bit
varying) AND (tree_sortkey >= B'00000000'::bit varying) AND
(tree_sortkey <= B'0000000011111111111111111111111111111111'::bit
varying))
         Filter: (deleted_at IS NULL)
   ->  Index Scan using pp_test_wide_tree_sortkey_idx on pp_test_wide
(cost=0.00..194.86 rows=14 width=11) (actual time=10.530..10.530
rows=0 loops=17391)
         Index Cond: ((pp_test_wide.tree_sortkey >=
B'00000000000101010000010001010100'::bit varying) AND
(pp_test_wide.tree_sortkey <=
B'0000000000010101000001000101010011111111111111111111111111111111'::bit
varying) AND (p.tree_sortkey >= pp_test_wide.tree_sortkey))
         Filter: ((pp_test_wide.product_name IS NOT NULL) AND
(pp_test_wide.tree_sortkey <> B'00000000000101010000010001010100'::bit
varying))
 Total runtime: 183421.226 ms

Note that it's got 0 rows with 17391 loops on the bottom half, time of
10ms each. which adds up to just under the total cost in the nested
loop anti-join of ~183,000 ms.  If I crank up random_page_cost to say
100 I get this plan:

Nested Loop Anti Join  (cost=15202.10..144269.58 rows=9292 width=997)
(actual time=71.281..271.018 rows=17391 loops=1)
   Join Filter: ((p.tree_sortkey >= pp_test_wide.tree_sortkey) AND
(p.tree_sortkey <= tree_right(pp_test_wide.tree_sortkey)))
   ->  Seq Scan on pp_test_wide p  (cost=0.00..16009.92 rows=10453
width=986) (actual time=0.024..183.341 rows=17391 loops=1)
         Filter: ((deleted_at IS NULL) AND (tree_sortkey >=
B'00000000000101010000010001010100'::bit varying) AND (tree_sortkey <=
B'0000000000010101000001000101010011111111111111111111111111111111'::bit
varying) AND (tree_sortkey >= B'00000000'::bit varying) AND
(tree_sortkey <= B'0000000011111111111111111111111111111111'::bit
varying))
   ->  Materialize  (cost=15202.10..15202.54 rows=44 width=11) (actual
time=0.004..0.004 rows=0 loops=17391)
         ->  Seq Scan on pp_test_wide  (cost=0.00..15202.06 rows=44
width=11) (actual time=71.245..71.245 rows=0 loops=1)
               Filter: ((product_name IS NOT NULL) AND (tree_sortkey
>= B'00000000000101010000010001010100'::bit varying) AND (tree_sortkey
<= B'0000000000010101000001000101010011111111111111111111111111111111'::bit
varying) AND (tree_sortkey <> B'00000000000101010000010001010100'::bit
varying))
 Total runtime: 272.204 ms

Note that here we feed two seq scans into the anti-join and the speed
of both seq scan is quite low.  I've played around with the
selectivity to where the inner scan delivers 53 rows instead of none,
and the query performance was about the same for both plans (i.e.
random_page_cost = 1 is super slow lots of loops of 10ms,
random_page_cost = 100 is fast with two seq scans.).

It seems to me that deciding to do the index scan is a mistake whether
the lower portion of the esitmates 14 or 0 rows seems like a mistake,
since it is going to run an index scan's worth of loops from the upper
index scan. I.e. it should be guestimating that it's gonna do ~10,000
loops in the index scan query. It's like somehow the query planner
thinks that each loop is going to be MUCH faster than 10ms.

I can create an extract of this table for testing if someone wants one
to explore with.  Let me know.


Re: subselect requires offset 0 for good performance.

From
Vik Fearing
Date:
On 08/02/2013 09:37 AM, Vik Fearing wrote:
> EXPLAIN ANALYZE
> WITH RECURSIVE
> x (start_time) AS
> (
>     SELECT generate_series(1, 1000000)
> ),
> t (time, timeround) AS
> (
>     SELECT time, time - time % 900000 AS timeround
>     FROM (SELECT min(start_time) AS time FROM x) AS tmp
>     UNION ALL
>     SELECT time, time - time % 900000
>     FROM (SELECT (SELECT min(start_time) AS time
>                   FROM x
>                   WHERE start_time >= t.timeround + 900000)
>           FROM t
>           WHERE t.time IS NOT NULL OFFSET 0
>          ) tmp
> )
> SELECT count(*) FROM t WHERE time IS NOT NULL;
>
> If you remove the OFFSET 0, you'll see two more subplans (because "time"
> is referenced three times).

Is this not interesting to anyone?

Vik


Re: subselect requires offset 0 for good performance.

From
Scott Marlowe
Date:
OK I'm bumping this one last time in the hopes that someone has an
idea what to do to fix it.

Query plan: http://explain.depesz.com/s/kJ54

This query takes 180 seconds. It loops 17391 times across the lower
index using entries from the upper index. That seems buggy to me.
While the exact estimates are off, the fact that it estimates 10,433
rows in the above index scan means that it's expecting to do that many
loops on the bottom index scan.

This is one of those queries where adding an offset 0 to the inner
select reduces run time from 180s to 0.2s. And I can't imagine a plan
that thinks running 10k loops on an index scan is a good idea.


Re: subselect requires offset 0 for good performance.

From
Tom Lane
Date:
Scott Marlowe <scott.marlowe@gmail.com> writes:
> OK I'm bumping this one last time in the hopes that someone has an
> idea what to do to fix it.

> Query plan: http://explain.depesz.com/s/kJ54

> This query takes 180 seconds. It loops 17391 times across the lower
> index using entries from the upper index. That seems buggy to me.

There isn't all that much that the planner can do with that query.  There
are no equality join clauses, so no possibility of a merge or hash join;
the only way to implement the join is a nestloop.

Things would probably be better if it left out the one join clause it's
putting into the inner indexscan condition, so it could materialize the
result of the inner indexscan and then do a nestloop join against the
Material node.  I'd expect 9.0 and up to consider that a good idea ...
but looking back, I see this is 8.4, which means you're probably out of
luck on getting a better plan.  8.4's nearly out of warranty anyway ---
consider upgrading.

            regards, tom lane


Re: subselect requires offset 0 for good performance.

From
Scott Marlowe
Date:
On Tue, Aug 13, 2013 at 4:50 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Scott Marlowe <scott.marlowe@gmail.com> writes:
>> OK I'm bumping this one last time in the hopes that someone has an
>> idea what to do to fix it.
>
>> Query plan: http://explain.depesz.com/s/kJ54
>
>> This query takes 180 seconds. It loops 17391 times across the lower
>> index using entries from the upper index. That seems buggy to me.
>
> There isn't all that much that the planner can do with that query.  There
> are no equality join clauses, so no possibility of a merge or hash join;
> the only way to implement the join is a nestloop.
>
> Things would probably be better if it left out the one join clause it's
> putting into the inner indexscan condition, so it could materialize the
> result of the inner indexscan and then do a nestloop join against the
> Material node.  I'd expect 9.0 and up to consider that a good idea ...
> but looking back, I see this is 8.4, which means you're probably out of
> luck on getting a better plan.  8.4's nearly out of warranty anyway ---
> consider upgrading.

Thanks for the hints, we'll try them. As for the upgrade an upgrade to
9.1, possibly 9.2 is already in the planning stages. But you know how
production upgrades go, slow...