Thread: Slow query with join

Slow query with join

From

Marc Watson

Date:

16 March 2015, 22:24:43

Hello all,
I am using PostgreSQL 9.4.1, compiled by Visual C++ build 1800, 64-bit, as downloaded from EnterpriseDB, and is running
onmy dev system under Win 7 64-bit. 
I hope someone can help me with a problem I'm having when joining a view with a table. The view is somewhat involved,
butI can provide the details if necessary 
A query on the table is quick (16 ms):
explain analyze select ir_actor_id from f_intervenant_ref where ir_dos_id = '5226' order by ir_actor_id;

"Sort  (cost=17.28..17.30 rows=8 width=4) (actual time=0.032..0.033 rows=8 loops=1)"
"  Sort Key: ir_actor_id"
"  Sort Method: quicksort  Memory: 25kB"
"  ->  Index Scan using ir_dos_id_idx on f_intervenant_ref  (cost=0.28..17.16 rows=8 width=4) (actual time=0.019..0.024
rows=8loops=1)" 
"        Index Cond: ((ir_dos_id)::text = '5226'::text)"
"Planning time: 0.180 ms"
"Execution time: 0.049 ms"

A query on the view is also quick (31 ms), using the results from the previous query on the table:
explain analyze select * from v_actor where v_actor.actor_id in(77170,77170,77184,77184,77185,77185,77186,77186);
"Nested Loop Left Join  (cost=0.86..385.18 rows=8 width=138) (actual time=2.819..9.652 rows=4 loops=1)"
"  Join Filter: (actor.type = 'physical'::business.actor_type)"
"  ->  Nested Loop Left Join  (cost=0.58..105.16 rows=8 width=114) (actual time=0.015..0.032 rows=4 loops=1)"
"        Join Filter: (actor.type = 'moral'::business.actor_type)"
"        ->  Index Scan using actor_pkey on actor  (cost=0.29..38.66 rows=8 width=78) (actual time=0.010..0.018 rows=4
loops=1)"
"              Index Cond: (id = ANY ('{77170,77170,77184,77184,77185,77185,77186,77186}'::integer[]))"
"              Filter: (deleted IS FALSE)"
"        ->  Index Scan using moral_actor_pkey on moral_actor  (cost=0.28..8.30 rows=1 width=40) (actual
time=0.002..0.002rows=0 loops=4)" 
"              Index Cond: (id = actor.id)"
"  ->  Index Scan using physical_actor_pkey on physical_actor  (cost=0.29..8.30 rows=1 width=28) (actual
time=0.003..0.004rows=1 loops=4)" 
"        Index Cond: (id = actor.id)"
"  SubPlan 1"
"    ->  Limit  (cost=8.30..8.31 rows=1 width=8) (never executed)"
"          ->  Sort  (cost=8.30..8.31 rows=1 width=8) (never executed)"
"                Sort Key: contact.rank"
"                ->  Index Scan using contact_actor_idx on contact  (cost=0.28..8.29 rows=1 width=8) (never executed)"
"                      Index Cond: (actor_id = actor.id)"
"                      Filter: (NOT deleted)"
"  SubPlan 2"
"    ->  Limit  (cost=8.30..8.31 rows=1 width=8) (actual time=0.005..0.005 rows=0 loops=4)"
"          ->  Sort  (cost=8.30..8.31 rows=1 width=8) (actual time=0.004..0.004 rows=0 loops=4)"
"                Sort Key: contact_1.rank"
"                Sort Method: quicksort  Memory: 25kB"
"                ->  Index Scan using contact_actor_idx on contact contact_1  (cost=0.28..8.29 rows=1 width=8) (actual
time=0.002..0.002rows=0 loops=4)" 
"                      Index Cond: (actor_id = actor.id)"
"                      Filter: (NOT deleted)"
"Planning time: 0.721 ms"
"Execution time: 9.759 ms"

However, when I combine the two queries into one, the result set takes 6742 ms:
explain analyze select * from v_actor where v_actor.actor_id in(select ir_actor_id from f_intervenant_ref where
ir_dos_id= '5226' order by ir_actor_id); 
"Merge Semi Join  (cost=71.79..108061.92 rows=8 width=1461) (actual time=7884.994..7927.699 rows=4 loops=1)"
"  Merge Cond: (actor.id = f_intervenant_ref.ir_actor_id)"
"  ->  Merge Left Join  (cost=0.85..554314.28 rows=20670 width=138) (actual time=2.820..7926.001 rows=3072 loops=1)"
"        Merge Cond: (actor.id = physical_actor.id)"
"        Join Filter: (actor.type = 'physical'::business.actor_type)"
"        ->  Merge Left Join  (cost=0.57..1679.17 rows=20670 width=114) (actual time=0.016..4.768 rows=3072 loops=1)"
"              Merge Cond: (actor.id = moral_actor.id)"
"              Join Filter: (actor.type = 'moral'::business.actor_type)"
"              ->  Index Scan using actor_pkey on actor  (cost=0.29..1275.50 rows=20670 width=78) (actual
time=0.008..3.190rows=3072 loops=1)" 
"                    Filter: (deleted IS FALSE)"
"              ->  Index Scan using moral_actor_pkey on moral_actor  (cost=0.28..268.78 rows=5548 width=40) (actual
time=0.006..0.006rows=1 loops=1)" 
"        ->  Index Scan using physical_actor_pkey on physical_actor  (cost=0.29..725.98 rows=15122 width=28) (actual
time=0.003..3.208rows=3072 loops=1)" 
"        SubPlan 1"
"          ->  Limit  (cost=8.30..8.31 rows=1 width=8) (never executed)"
"                ->  Sort  (cost=8.30..8.31 rows=1 width=8) (never executed)"
"                      Sort Key: contact.rank"
"                      ->  Index Scan using contact_actor_idx on contact  (cost=0.28..8.29 rows=1 width=8) (never
executed)"
"                            Index Cond: (actor_id = actor.id)"
"                            Filter: (NOT deleted)"
"        SubPlan 2"
"          ->  Limit  (cost=8.30..8.31 rows=1 width=8) (actual time=0.006..0.006 rows=0 loops=3072)"
"                ->  Sort  (cost=8.30..8.31 rows=1 width=8) (actual time=0.005..0.005 rows=0 loops=3072)"
"                      Sort Key: contact_1.rank"
"                      Sort Method: quicksort  Memory: 25kB"
"                      ->  Index Scan using contact_actor_idx on contact contact_1  (cost=0.28..8.29 rows=1 width=8)
(actualtime=0.003..0.003 rows=0 loops=3072)" 
"                            Index Cond: (actor_id = actor.id)"
"                            Filter: (NOT deleted)"
"  ->  Materialize  (cost=17.28..17.40 rows=8 width=4) (actual time=0.024..0.027 rows=8 loops=1)"
"        ->  Sort  (cost=17.28..17.30 rows=8 width=4) (actual time=0.021..0.022 rows=8 loops=1)"
"              Sort Key: f_intervenant_ref.ir_actor_id"
"              Sort Method: quicksort  Memory: 25kB"
"              ->  Index Scan using ir_dos_id_idx on f_intervenant_ref  (cost=0.28..17.16 rows=8 width=4) (actual
time=0.012..0.017rows=8 loops=1)" 
"                    Index Cond: ((ir_dos_id)::text = '5226'::text)"
"Planning time: 0.820 ms"
"Execution time: 7927.838 ms"

Any suggestions to help me speed this up will be greatly appreciated.



Mark Watson
Service au client - R&D
www.jurisconcept.ca

Re: Slow query with join

From

"David G. Johnston"

Date:

16 March 2015, 23:40:04

On Mon, Mar 16, 2015 at 11:50 AM, Marc Watson <mark.watson@jurisconcept.ca> wrote:

Hello all,
I am using PostgreSQL 9.4.1, compiled by Visual C++ build 1800, 64-bit, as downloaded from EnterpriseDB, and is running on my dev system under Win 7 64-bit.

[...]

However, when I combine the two queries into one, the result set takes 6742 ms:
explain analyze select * from v_actor where v_actor.actor_id in(select ir_actor_id from f_intervenant_ref where ir_dos_id = '5226' order by ir_actor_id);

You might want to consider whether the following is acceptable; but it would depend on the relationship between f_intervenant_ref and v_actor:

SELECT *

FROM v_actor

JOIN f_intervenant_ref ON (actor_id = ir_actor_id)

WHERE ir_dos_id = '5226';

"IN" has issues due to necessary consideration of possible NULLs in the list.

Furthermore; even in your original query there is no value to incorporating an ORDER BY into the IN subquery.

I suspect that this second problem is preventing the planner from pushing the subquery down into the view and so is forced to perform a "Merge Semi Join" against the full (and thus expensive) view while the constants in the second query can be pushed down and the planner is able to choose the "Nested Loop Left Join" over 8 keys (4 rows) which ends up being very fast.

The other option is to use EXISTS:

SELECT *

FROM v_actor

WHERE EXIST (SELECT 1 FROM f_intervenant_ref WHERE actor_id = ir_actor_id AND ir_dos_id = '5226')

"Merge Semi Join (cost=71.79..108061.92 rows=8 width=1461) (actual time=7884.994..7927.699 rows=4 loops=1)"
" Merge Cond: (actor.id = f_intervenant_ref.ir_actor_id)"
" -> Merge Left Join (cost=0.85..554314.28 rows=20670 width=138) (actual time=2.820..7926.001 rows=3072 loops=1)"

[...]

" -> Materialize (cost=17.28..17.40 rows=8 width=4) (actual time=0.024..0.027 rows=8 loops=1)"
" -> Sort (cost=17.28..17.30 rows=8 width=4) (actual time=0.021..0.022 rows=8 loops=1)"
" Sort Key: f_intervenant_ref.ir_actor_id"
" Sort Method: quicksort Memory: 25kB"
" -> Index Scan using ir_dos_id_idx on f_intervenant_ref (cost=0.28..17.16 rows=8 width=4) (actual time=0.012..0.017 rows=8 loops=1)"
" Index Cond: ((ir_dos_id)::text = '5226'::text)"
"Planning time: 0.820 ms"
"Execution time: 7927.838 ms"

Any suggestions to help me speed this up will be greatly appreciated.

David J.

Re: Slow query with join

From

Tomas Vondra

Date:

16 March 2015, 23:50:47

On 16.3.2015 19:50, Marc Watson wrote:
> Hello all,
> I am using PostgreSQL 9.4.1, compiled by Visual C++ build 1800,
64-bit, as downloaded from EnterpriseDB, and is running on my dev system
under Win 7 64-bit.
> I hope someone can help me with a problem I'm having when joining a
view with a table. The view is somewhat involved, but I can provide the
details if necessary
> A query on the table is quick (16 ms):
<
> explain analyze select ir_actor_id from f_intervenant_ref where ir_dos_id = '5226' order by ir_actor_id;
>
> "Sort  (cost=17.28..17.30 rows=8 width=4) (actual time=0.032..0.033 rows=8 loops=1)"
> "  Sort Key: ir_actor_id"
> "  Sort Method: quicksort  Memory: 25kB"
> "  ->  Index Scan using ir_dos_id_idx on f_intervenant_ref  (cost=0.28..17.16 rows=8 width=4) (actual
time=0.019..0.024rows=8 loops=1)" 
> "        Index Cond: ((ir_dos_id)::text = '5226'::text)"
> "Planning time: 0.180 ms"
> "Execution time: 0.049 ms"
> ..

ISTM the database is applying the IN() condition last, i.e. it executes

   SELECT * FROM v_actor

and then proceeds to filter the result. I'd bet if you measure time for
that (SELECT * FROM v_actor) you'll get ~7 seconds.

First, get rid of the ORDER BY clauses in the subselects - it's
completely pointless, and might prevent proper optimization (e.g.
replacing the IN() with optimized joins.

I.e. try this:

 SELECT * FROM v_actor
  WHERE v_actor.actor_id IN (SELECT ir_actor_id FROM f_intervenant_ref
                              WHERE ir_dos_id = '5226');

I'd also try replacing this with EXISTS

 SELECT * FROM v_actor
  WHERE EXISTS (SELECT 1 FROM f_intervenant_ref
                 WHERE (actor_id = ir_actor_id)
                   AND (ir_dos_id = '5226'));

or even an explicit join

 SELECT v_actor.* FROM v_actor JOIN f_intervenant_ref
                    ON (actor_id = ir_actor_id)
                 WHERE ir_dos_id = '5226');

That might behave a bit differently if there are multiple
f_intervenant_ref rows matching the actor. If that's the case, a simple
DISTINCT should fix that.

--
Tomas Vondra                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Slow query with join

From

Tom Lane

Date:

17 March 2015, 00:06:51

Tomas Vondra <tomas.vondra@2ndquadrant.com> writes:
> On 16.3.2015 19:50, Marc Watson wrote:
>> I hope someone can help me with a problem I'm having when joining a
>> view with a table. The view is somewhat involved, but I can provide the
>> details if necessary

> First, get rid of the ORDER BY clauses in the subselects - it's
> completely pointless, and might prevent proper optimization (e.g.
> replacing the IN() with optimized joins.

I'm suspicious that the cause may be an ORDER BY in the view.  It's
hard to tell when we've not seen the view definition, but I see that both
plans we've been shown are going to produce output sorted by actor.id.
Maybe that's happenstance, or maybe not.

            regards, tom lane

Re: Slow query with join

From

Marc Watson

Date:

17 March 2015, 16:01:34

>-----Message d'origine-----
>De : Tom Lane [mailto:tgl@sss.pgh.pa.us]
>Envoyé : March-16-15 5:07 PM
>À : Tomas Vondra
>Cc : pgsql-general@postgresql.org; Marc Watson
>Objet : Re: [GENERAL] Slow query with join
>
>Tomas Vondra <tomas.vondra@2ndquadrant.com> writes:
>> On 16.3.2015 19:50, Marc Watson wrote:
>>> I hope someone can help me with a problem I'm having when joining a
>>> view with a table. The view is somewhat involved, but I can provide the
>>> details if necessary
>
>> First, get rid of the ORDER BY clauses in the subselects - it's
>> completely pointless, and might prevent proper optimization (e.g.
>> replacing the IN() with optimized joins.
>
>I'm suspicious that the cause may be an ORDER BY in the view.  It's
>hard to tell when we've not seen the view definition, but I see that both
>plans we've been shown are going to produce output sorted by actor.id.
>Maybe that's happenstance, or maybe not.
>
>            regards, tom lane

Thanks for replying.
First of all, there is an ORDER BY in the view - ORDER BY actor.id. Removing this, the query takes about 47 secs.,
whichis also the same as if I simply do a select * from v_actor, with or without the ORDER BY in the view. 
It is something in the view, which I will explore further. If I create a materialized view mv_actor, select * from
mv_actortakes 8 sec and my original query using the materialized view  takes 16ms. 

David G. Johnston <david.g.johnston@gmail.com> wrote:
>You might want to consider whether the following is acceptable; but it would depend on the relationship between
f_intervenant_refand v_actor: 
>
>SELECT *
>FROM v_actor
>JOIN f_intervenant_ref ON (actor_id = ir_actor_id)
>WHERE ir_dos_id = '5226'

The relationship between f_intervenant_ref and v_actor is simple. ir_actor_id is an integer that contains the value
fromthe table actor.id, a serial and the primary key.  This gives the same 7 secs, or 49 secs with out the ORDER BY in
theview. The explain analyze output (with the ORDER BY in the view) is : 
 "Merge Join  (cost=54.78..108097.04 rows=8 width=1540) (actual time=6925.158..6962.631 rows=8 loops=1)"
"  Merge Cond: (actor.id = f_intervenant_ref.ir_actor_id)"
"  ->  Merge Left Join  (cost=0.85..554262.60 rows=20670 width=138) (actual time=2.840..6960.625 rows=3072 loops=1)"
"        Merge Cond: (actor.id = physical_actor.id)"
"        ->  Merge Left Join  (cost=0.57..1665.30 rows=20670 width=114) (actual time=0.014..3.791 rows=3072 loops=1)"
"              Merge Cond: (actor.id = moral_actor.id)"
"              ->  Index Scan using actor_pkey on actor  (cost=0.29..1275.50 rows=20670 width=78) (actual
time=0.006..1.667rows=3072 loops=1)" 
"              ->  Index Scan using moral_actor_pkey on moral_actor  (cost=0.28..268.78 rows=5548 width=40) (actual
time=0.005..0.005rows=1 loops=1)" 
"        ->  Index Scan using physical_actor_pkey on physical_actor  (cost=0.29..725.98 rows=15122 width=28) (actual
time=0.003..3.072rows=3072 loops=1)" 
"        SubPlan 1"
"          ->  Limit  (cost=8.30..8.31 rows=1 width=8) (never executed)"
"                ->  Sort  (cost=8.30..8.31 rows=1 width=8) (never executed)"
"                      Sort Key: contact.rank"
"                      ->  Index Scan using contact_actor_idx on contact  (cost=0.28..8.29 rows=1 width=8) (never
executed)"
"                            Index Cond: (actor_id = actor.id)"
"        SubPlan 2"
"          ->  Limit  (cost=8.30..8.31 rows=1 width=8) (actual time=0.006..0.006 rows=0 loops=3072)"
"                ->  Sort  (cost=8.30..8.31 rows=1 width=8) (actual time=0.005..0.005 rows=0 loops=3072)"
"                      Sort Key: contact_1.rank"
"                      Sort Method: quicksort  Memory: 25kB"
"                      ->  Index Scan using contact_actor_idx on contact contact_1  (cost=0.28..8.29 rows=1 width=8)
(actualtime=0.002..0.003 rows=0 loops=3072)" 
"                            Index Cond: (actor_id = actor.id)"
"  ->  Index Scan using ir_actor_id_idx on f_intervenant_ref  (cost=0.28..62.58 rows=8 width=79) (actual
time=0.386..0.449rows=8 loops=1)" 
"        Filter: ((ir_dos_id)::text = '5226'::text)"
"        Rows Removed by Filter: 891"
"Planning time: 0.864 ms"
"Execution time: 6962.773 ms"

>The other option is to use EXISTS:
>...
Same 7 secs.

A scaled-down version of my view runs quickly, so I'll gradually start inserting the missing pieces until I find the
culprit.I'll let you know the cause when I find it. 
Thx again,
Mark Watson

Re: Slow query with join

From

Marc Watson

Date:

17 March 2015, 21:52:11

Update :

My query  SELECT * FROM v_actor JOIN f_intervenant_ref ON (actor_id = ir_actor_id) WHERE ir_dos_id = '5226' took 7
secs.
If I substitute the _RETURN rule for the view and add the above join, it takes  31 ms.

Mark Watson
Service au client - R&D
Tél. 418 659-7272 ou 1 888 692-1050
www.jurisconcept.ca