Re: Sequencial scan in a JOIN - Mailing list pgsql-performance

From Robert Haas
Subject Re: Sequencial scan in a JOIN
Date
Msg-id CA+TgmobtviKnxJqEM9v4rSPOiNxWE8NCEPcuufm+-VdpGr8A0g@mail.gmail.com
Whole thread Raw
In response to Sequencial scan in a JOIN  (Andrew Jaimes <andrewjaimes@hotmail.com>)
List pgsql-performance
On Tue, Jun 5, 2012 at 8:48 AM, Andrew Jaimes <andrewjaimes@hotmail.com> wrote:
> Hi everyone,
>
> I am trying to run the following query:
>
> SELECT  count(1) --DISTINCT l_userqueue.queueid
>   FROM e_usersessions
>   JOIN l_userqueue
>     ON l_userqueue.userid = e_usersessions.entityid
>   JOIN a_activity
>     ON a_activity.activequeueid  = l_userqueue.queueid
>    AND a_activity.vstatus        = 1
>    AND a_activity.ventrydate     > 0
>    AND a_activity.sbuid          = e_usersessions.sbuid
>    AND a_activity.assignedtoid   = 0
>    AND a_activity.status        <> '0'
>   WHERE e_usersessions.sessionkeepalivedatetime > 20120605082131943
>
> Explain analyze:
> 'Aggregate  (cost=100402.10..100402.11 rows=1 width=0) (actual
> time=2249.051..2249.051 rows=1 loops=1)'
> '  ->  Hash Join  (cost=10.93..99795.09 rows=242803 width=0) (actual
> time=0.541..2249.027 rows=33 loops=1)'
> '        Hash Cond: ((a_activity.activequeueid = l_userqueue.queueid) AND
> (a_activity.sbuid = e_usersessions.sbuid))'
> '        ->  Seq Scan on a_activity  (cost=0.00..88462.52 rows=1208167
> width=22) (actual time=0.010..1662.142 rows=1207855 loops=1)'
> '              Filter: ((ventrydate > 0) AND ((status)::text <> '0'::text)
> AND (vstatus = 1) AND (assignedtoid = 0::numeric))'
> '        ->  Hash  (cost=10.86..10.86 rows=5 width=22) (actual
> time=0.053..0.053 rows=4 loops=1)'
> '              ->  Hash Join  (cost=9.38..10.86 rows=5 width=22) (actual
> time=0.033..0.048 rows=4 loops=1)'
> '                    Hash Cond: (l_userqueue.userid =
> e_usersessions.entityid)'
> '                    ->  Seq Scan on l_userqueue  (cost=0.00..1.23 rows=23
> width=27) (actual time=0.003..0.009 rows=23 loops=1)'
> '                    ->  Hash  (cost=9.31..9.31 rows=5 width=21) (actual
> time=0.018..0.018 rows=2 loops=1)'
> '                          ->  Index Scan using i06_e_usersessions on
> e_usersessions  (cost=0.00..9.31 rows=5 width=21) (actual time=0.009..0.012
> rows=2 loops=1)'
> '                                Index Cond: (sessionkeepalivedatetime >
> 20120605082131943::bigint)'
> 'Total runtime: 2249.146 ms'
>
> I am trying to understand the reason why the a sequencial scan is used on
> a_activity instead of using the index by activequeueid (i08_a_activity).

I'm chiming in a bit late here, but it seems like you're hoping that
the query plan will form the outer join as a nested loop, with the
inner and outer sides swapped, so that the results of the join between
l_userqueue and e_usersessions are used to drive a series of index
scans on a_activity that avoid scanning the whole table.  PostgreSQL
9.2 will be the first release that has the ability to generate that
kind of plan, so it would be interesting to see what happens if you
try this on 9.2beta.

Older releases should be able consider a nested loop join with
l_userqueue as the inner rel, driving an index scan over a_activity,
and then performing the join to e_usersessions afterwards.  But that
plan might not be nearly as good, since then we'd have to do 23
index-scans on a_activity rather than just 4.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

pgsql-performance by date:

Previous
From: Robert Haas
Date:
Subject: Re: Array fundamentals
Next
From: Nick Hofstede
Date:
Subject: Re: optimizing queries using IN and EXISTS