Re: subselect requires offset 0 for good performance. - Mailing list pgsql-performance

From Scott Marlowe
Subject Re: subselect requires offset 0 for good performance.
Date
Msg-id CAOR=d=0FhMvYC5iFMRhJ_exQKYQE1_CzRBsAUhgQMGCe1mFYRw@mail.gmail.com
Whole thread Raw
In response to Re: subselect requires offset 0 for good performance.  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: subselect requires offset 0 for good performance.
List pgsql-performance
On Fri, Aug 2, 2013 at 2:51 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Scott Marlowe <scott.marlowe@gmail.com> writes:
>> Yep. Added the indexes and performance went right into the dumper. New
>> plan on new table with old data added in random order now looks like
>> the old table, only worse because it's on a slower drive. Just to be
>> complete here's the plan: http://explain.depesz.com/s/PYH Note that I
>> created new table with order by random() and created indexes. Ran
>> analyze on it, and the select plan looks similar now:
>> http://explain.depesz.com/s/bsE
>
>> So maybe I can make a test case now. But to summarize, when it can use
>> indexes this query gets REAL slow because it lacks a materialize step.
>> That seem about right?
>
> Well, the plans shown here could *not* use a materialize step because the
> inner scan makes use of a value from the current outer row.  The
> materialized plan has to omit one of the index conditions from the inner
> scan and then apply it as a join condition.
>
> I suspect the real reason that the fast case is fast is that the inner
> relation, even without the p.tree_sortkey >= pro_partners.tree_sortkey
> condition, is empty, and thus the join runs very quickly.  But the planner
> doesn't know that.  Its estimate of the row count isn't very large, but
> it's definitely not zero, plus it thinks that adding the additional index
> condition reduces the rowcount by a factor of 3 from there.  So it comes
> to the wrong conclusion about the value of materializing a fixed inner
> relation as opposed to using a parameterized indexscan.
>
> Have you tried increasing the statistics targets for these join columns?
> It's also possible that what you need to do is adjust the planner's
> cost parameters ...

I've tried changing random_page_cost, sequential_page_cost, the cpu*
costs, and setting effective_cache_size all over the place and it
stays just as slow.

our default stats target is 100. Did a stats target = 1000 on the
three cols we access. Same terrible performance. Plan here:
http://explain.depesz.com/s/XVt
stats target=10000, same bad performance, plan:
http://explain.depesz.com/s/kJ54 pretty much the same.  Setting
effective_cache_size='1000GB' make no difference, still slow.

If I set random_page_cost to 75 makes it work, i.e. a materialize
shows up. Note that we run on FusionIO cards, and the whole db fits in
memory, so a very large effective cache size and random page cost of
1.0 is actually accurate for our hardware.


pgsql-performance by date:

Previous
From: Sergey Burladyan
Date:
Subject: Re: Looks like merge join planning time is too big, 55 seconds
Next
From: Tomas Vondra
Date:
Subject: Re: PG performance issues related to storage I/O waits