Thread: 8.4/9.0 simple query performance regression

8.4/9.0 simple query performance regression

From
Josh Berkus
Date:
All,

Just got this simple case off IRC today:

8.4.4
This plan completes in 100ms:

old_prod=# explain analyze select email from u_contact where id not in
(select contact_id from u_user);
                                                     QUERY PLAN

--------------------------------------------------------------------------------------------------------------------
 Seq Scan on u_contact  (cost=2217.72..4759.74 rows=35560 width=22)
(actual time=61.283..107.169 rows=4521 loops=1)
   Filter: (NOT (hashed SubPlan 1))
   SubPlan 1
     ->  Seq Scan on u_user  (cost=0.00..2051.38 rows=66538 width=8)
(actual time=0.034..33.303 rows=66978 loops=1)
 Total runtime: 108.001 ms


9.0.2
This plan does not complete in 15 minutes or more:

new_prod=# explain select email from u_contact where id not in (select
contact_id from u_user);
                                QUERY PLAN
---------------------------------------------------------------------------
 Seq Scan on u_contact  (cost=0.00..100542356.74 rows=36878 width=22)
   Filter: (NOT (SubPlan 1))
   SubPlan 1
     ->  Materialize  (cost=0.00..2552.56 rows=69504 width=8)
           ->  Seq Scan on u_user  (cost=0.00..1933.04 rows=69504 width=8)
(5 rows)

I'm at a bit of a loss as to what's happening here.  I'd guess another
failure of a bail-out-early plan, but I can't see how that would work
with this query.

--
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com

Re: 8.4/9.0 simple query performance regression

From
Tom Lane
Date:
Josh Berkus <josh@agliodbs.com> writes:
> Just got this simple case off IRC today:
> [ hashed versus non-hashed subplan ]
> I'm at a bit of a loss as to what's happening here.

Possibly work_mem is smaller in the second installation?

(If I'm counting on my fingers right, you'd need a setting of at least a
couple MB to let it choose a hashed subplan for this case.)

            regards, tom lane

Re: 8.4/9.0 simple query performance regression

From
Vitalii Tymchyshyn
Date:
07.06.11 00:45, Josh Berkus написав(ла):
> All,
>
> Just got this simple case off IRC today:
>
> 8.4.4
> This plan completes in 100ms:
>     Filter: (NOT (hashed SubPlan 1))

> 9.0.2
> This plan does not complete in 15 minutes or more:
>     Filter: (NOT (SubPlan 1))
"Hashed" is the key. Hashed subplans usually has much better performance.
You need to increase work_mem. I suppose it is in default state as you
need not too much memory for hash of 70K integer values.
BTW: Why do it want to materialize a result of seq scan without filter.
I can see no benefits (or is it more narrow rows?)

Best regards, Vitalii Tymchyshyn