Thread: seq scan issue...

seq scan issue...

From
kevin kempter
Date:
Hi List;

I have a large tble (playback_device) with 6million rows in it. The
aff_id_tmp1 table has 600,000 rows.

I also have this query:
select distinct
tmp1.affiliate_id,
tmp1.name,
tmp1.description,
tmp1.create_dt,
tmp1.playback_device_id,
pf.segment_id
from
aff_id_tmp1 tmp1,
playback_fragment pf
where
tmp1.playback_device_id = pf.playback_device_id ;


The Primary Key for playback_device is the playback_device_id
there is also an index on playback_device_id on the aff_id_tmp1 table.
The only join condition I have is on this key pair (I've posted my
explain plan below)


- why am I still getting a seq scan ?

Thanks in advance.






============
Explain PLan
============

explain
select distinct
tmp1.affiliate_id,
tmp1.name,
tmp1.description,
tmp1.create_dt,
tmp1.playback_device_id,
pf.segment_id
from
aff_id_tmp1 tmp1,
playback_fragment pf
where
tmp1.playback_device_id = pf.playback_device_id ;


  Unique  (cost=2966361.56..3194555.91 rows=10104496 width=97)
    ->  Sort  (cost=2966361.56..2998960.76 rows=13039677 width=97)
          Sort Key: tmp1.affiliate_id, tmp1.name, tmp1.description,
tmp1.create_dt,
tmp1.playback_device_id, pf.segment_id
          ->  Hash Join  (cost=23925.45..814071.14 rows=13039677
width=97)
                Hash Cond: (pf.playback_device_id =
tmp1.playback_device_id)
                ->  Seq Scan on playback_fragment pf
(cost=0.00..464153.77 rows=130
39677 width=16)
                ->  Hash  (cost=16031.31..16031.31 rows=631531 width=89)
                      ->  Seq Scan on aff_id_tmp1 tmp1
(cost=0.00..16031.31 rows=63
1531 width=89)
(1068 rows)


Re: seq scan issue...

From
"Jeffrey Baker"
Date:
On Thu, Apr 17, 2008 at 11:24 AM, kevin kempter
<kevin@kevinkempterllc.com> wrote:
> Hi List;
>
>  I have a large tble (playback_device) with 6million rows in it. The
> aff_id_tmp1 table has 600,000 rows.
>  - why am I still getting a seq scan ?
>

You're selecting almost all the rows in the product of aff_id_tmp1 *
playback_fragment.  A sequential scan will be far faster than an index
scan.  You can prove this to yourself using 'set enable_seqscan to
false' and running the query again.  It should be much slower.

Re: seq scan issue...

From
Rodrigo Gonzalez
Date:
kevin kempter escribió:
> Hi List;
>
> I have a large tble (playback_device) with 6million rows in it. The
> aff_id_tmp1 table has 600,000 rows.
>
> I also have this query:
> select distinct
> tmp1.affiliate_id,
> tmp1.name,
> tmp1.description,
> tmp1.create_dt,
> tmp1.playback_device_id,
> pf.segment_id
> from
> aff_id_tmp1 tmp1,
> playback_fragment pf
> where
> tmp1.playback_device_id = pf.playback_device_id ;
>
>
> The Primary Key for playback_device is the playback_device_id
> there is also an index on playback_device_id on the aff_id_tmp1 table.
> The only join condition I have is on this key pair (I've posted my
> explain plan below)
>
>
> - why am I still getting a seq scan ?
>
> Thanks in advance.
>
>
>
>
>
>
> ============
> Explain PLan
> ============
>
> explain
> select distinct
> tmp1.affiliate_id,
> tmp1.name,
> tmp1.description,
> tmp1.create_dt,
> tmp1.playback_device_id,
> pf.segment_id
> from
> aff_id_tmp1 tmp1,
> playback_fragment pf
> where
> tmp1.playback_device_id = pf.playback_device_id ;
>
>
>  Unique  (cost=2966361.56..3194555.91 rows=10104496 width=97)
>    ->  Sort  (cost=2966361.56..2998960.76 rows=13039677 width=97)
>          Sort Key: tmp1.affiliate_id, tmp1.name, tmp1.description,
> tmp1.create_dt,
> tmp1.playback_device_id, pf.segment_id
>          ->  Hash Join  (cost=23925.45..814071.14 rows=13039677 width=97)
>                Hash Cond: (pf.playback_device_id =
> tmp1.playback_device_id)
>                ->  Seq Scan on playback_fragment pf
> (cost=0.00..464153.77 rows=130
> 39677 width=16)
>                ->  Hash  (cost=16031.31..16031.31 rows=631531 width=89)
>                      ->  Seq Scan on aff_id_tmp1 tmp1
> (cost=0.00..16031.31 rows=63
> 1531 width=89)
> (1068 rows)
>
>
Cause you are getting all the rows so pgsql need to scan all the table...


Attachment

Re: seq scan issue...

From
PFC
Date:
> - why am I still getting a seq scan ?

You'll seq scan tmp1 obviously, and also the other table since you fetch a
very large part of it in the process.
It's the only way to do this query since there is no WHERE to restrict the
number of rows and the DISTINCT applies on columns from both tables.

You might want to simplify your query. For instance perhaps you can get
pf.segment_id out of the DISTINCT, in which case you can put the distinct
in a subquery on tmp1.

>
> Thanks in advance.
>
>
>
>
>
>
> ============
> Explain PLan
> ============
>
> explain
> select distinct
> tmp1.affiliate_id,
> tmp1.name,
> tmp1.description,
> tmp1.create_dt,
> tmp1.playback_device_id,
> pf.segment_id
> from
> aff_id_tmp1 tmp1,
> playback_fragment pf
> where
> tmp1.playback_device_id = pf.playback_device_id ;
>
>
>   Unique  (cost=2966361.56..3194555.91 rows=10104496 width=97)
>     ->  Sort  (cost=2966361.56..2998960.76 rows=13039677 width=97)
>           Sort Key: tmp1.affiliate_id, tmp1.name, tmp1.description,
> tmp1.create_dt,
> tmp1.playback_device_id, pf.segment_id
>           ->  Hash Join  (cost=23925.45..814071.14 rows=13039677
> width=97)
>                 Hash Cond: (pf.playback_device_id =
> tmp1.playback_device_id)
>                 ->  Seq Scan on playback_fragment pf
> (cost=0.00..464153.77 rows=130
> 39677 width=16)
>                 ->  Hash  (cost=16031.31..16031.31 rows=631531 width=89)
>                       ->  Seq Scan on aff_id_tmp1 tmp1
> (cost=0.00..16031.31 rows=63
> 1531 width=89)
> (1068 rows)
>
>