Thread: Query plan good in 8.4, bad in 9.2 and better in 9.3

Query plan good in 8.4, bad in 9.2 and better in 9.3

From
Scott Marlowe
Date:
OK so we have a query that does OK in 8.4, goes to absolute crap in
9.2 and then works great in 9.3. Thing is we've spent several months
regression testing 9.2 and no time testing 9.3, so we can't just "go
to 9.3" in an afternoon. But we might have to. 9.2 seems hopelessly
broken here.

The query looks something like this:

SELECT COUNT(*) FROM u, ug
WHERE u.ugid = ug.id
AND NOT u.d
AND ug.somefield IN  (SELECT somefunction(12345));

In 8.4 we get this plan http://explain.depesz.com/s/r3hF which takes ~5ms
In 9.2 we get this plan http://explain.depesz.com/s/vM7 which takes ~10s
In 9.3 we get this plan http://explain.depesz.com/s/Wub which takes ~0.35ms

The data sets are identical, the schemas are identical. Making changes
to random_page_cost, sequential_page_cost and various other tuning
parameters don't make it any better.

PG versions: 8.4.20, 9.2.8,  9.3.4

Adding a limit to the function DOES make 9.2 better, ala:

SELECT COUNT(*) FROM u, ug
WHERE u.ugid = ug.id
AND NOT u.d
AND ug.somefield IN  (SELECT somefunction(12345) limit 199);

If the limit is 200 the bad plan shows up again.

Question, is this a known issue with 9.2? If so is it something that
will one day be fixed or are we stuck with it? Is there a workaround
to make it better? Note: I'd rather not have to compile 9.2 from
source with a patch, but at this point that would be acceptable over
"you're stuck with it".

--
To understand recursion, one must first understand recursion.


Re: Query plan good in 8.4, bad in 9.2 and better in 9.3

From
Tom Lane
Date:
Scott Marlowe <scott.marlowe@gmail.com> writes:
> OK so we have a query that does OK in 8.4, goes to absolute crap in
> 9.2 and then works great in 9.3. Thing is we've spent several months
> regression testing 9.2 and no time testing 9.3, so we can't just "go
> to 9.3" in an afternoon. But we might have to. 9.2 seems hopelessly
> broken here.

> The query looks something like this:

> SELECT COUNT(*) FROM u, ug
> WHERE u.ugid = ug.id
> AND NOT u.d
> AND ug.somefield IN  (SELECT somefunction(12345));

You really should show us somefunction's definition if you want
useful comments.  I gather however that it returns a set.  8.4
seems to be planning on the assumption that the set contains
only one row, which is completely unjustified in general though
it happens to be true in your example.  9.2 is assuming 1000 rows
in the set, and getting a sucky plan because that's wrong.  9.3
is still assuming that; and I rather doubt that you are really
testing 9.3 on the same data, because 9.2 is finding millions of
rows in a seqscan of u while 9.3 is finding none in the exact
same seqscan.

I'd suggest affixing a ROWS estimate to somefunction, or better
declaring it to return singleton not set if that's actually
always the case.

            regards, tom lane


Re: Query plan good in 8.4, bad in 9.2 and better in 9.3

From
Jeff Janes
Date:
On Thu, May 15, 2014 at 9:35 AM, Scott Marlowe <scott.marlowe@gmail.com> wrote:
OK so we have a query that does OK in 8.4, goes to absolute crap in
9.2 and then works great in 9.3. Thing is we've spent several months
regression testing 9.2 and no time testing 9.3, so we can't just "go
to 9.3" in an afternoon. But we might have to. 9.2 seems hopelessly
broken here.

The query looks something like this:

SELECT COUNT(*) FROM u, ug
WHERE u.ugid = ug.id
AND NOT u.d
AND ug.somefield IN  (SELECT somefunction(12345));

In 8.4 we get this plan http://explain.depesz.com/s/r3hF which takes ~5ms
In 9.2 we get this plan http://explain.depesz.com/s/vM7 which takes ~10s
In 9.3 we get this plan http://explain.depesz.com/s/Wub which takes ~0.35ms

Based on the actual row counts given in the seq scan on u, , in 9.2, u contains millions of rows.  In 9.3, it contains zero rows.

 

The data sets are identical, the schemas are identical.

Please double check that.

 
Cheers,

Jeff

Re: Query plan good in 8.4, bad in 9.2 and better in 9.3

From
Scott Marlowe
Date:
On Thu, May 15, 2014 at 10:52 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Scott Marlowe <scott.marlowe@gmail.com> writes:
>> OK so we have a query that does OK in 8.4, goes to absolute crap in
>> 9.2 and then works great in 9.3. Thing is we've spent several months
>> regression testing 9.2 and no time testing 9.3, so we can't just "go
>> to 9.3" in an afternoon. But we might have to. 9.2 seems hopelessly
>> broken here.
>
>> The query looks something like this:
>
>> SELECT COUNT(*) FROM u, ug
>> WHERE u.ugid = ug.id
>> AND NOT u.d
>> AND ug.somefield IN  (SELECT somefunction(12345));
>
> You really should show us somefunction's definition if you want
> useful comments.  I gather however that it returns a set.  8.4
> seems to be planning on the assumption that the set contains
> only one row, which is completely unjustified in general though
> it happens to be true in your example.  9.2 is assuming 1000 rows
> in the set, and getting a sucky plan because that's wrong.  9.3
> is still assuming that; and I rather doubt that you are really
> testing 9.3 on the same data, because 9.2 is finding millions of
> rows in a seqscan of u while 9.3 is finding none in the exact
> same seqscan.
>
> I'd suggest affixing a ROWS estimate to somefunction, or better
> declaring it to return singleton not set if that's actually
> always the case.

Well great, now I look like an idiot. Last time I trust someone else
to set up my test servers.

Anyway, yeah, affixing a rows estimate fixes this for us 100%. So thanks!