Thread: seq scan issue...
Hi List; I have a large tble (playback_device) with 6million rows in it. The aff_id_tmp1 table has 600,000 rows. I also have this query: select distinct tmp1.affiliate_id, tmp1.name, tmp1.description, tmp1.create_dt, tmp1.playback_device_id, pf.segment_id from aff_id_tmp1 tmp1, playback_fragment pf where tmp1.playback_device_id = pf.playback_device_id ; The Primary Key for playback_device is the playback_device_id there is also an index on playback_device_id on the aff_id_tmp1 table. The only join condition I have is on this key pair (I've posted my explain plan below) - why am I still getting a seq scan ? Thanks in advance. ============ Explain PLan ============ explain select distinct tmp1.affiliate_id, tmp1.name, tmp1.description, tmp1.create_dt, tmp1.playback_device_id, pf.segment_id from aff_id_tmp1 tmp1, playback_fragment pf where tmp1.playback_device_id = pf.playback_device_id ; Unique (cost=2966361.56..3194555.91 rows=10104496 width=97) -> Sort (cost=2966361.56..2998960.76 rows=13039677 width=97) Sort Key: tmp1.affiliate_id, tmp1.name, tmp1.description, tmp1.create_dt, tmp1.playback_device_id, pf.segment_id -> Hash Join (cost=23925.45..814071.14 rows=13039677 width=97) Hash Cond: (pf.playback_device_id = tmp1.playback_device_id) -> Seq Scan on playback_fragment pf (cost=0.00..464153.77 rows=130 39677 width=16) -> Hash (cost=16031.31..16031.31 rows=631531 width=89) -> Seq Scan on aff_id_tmp1 tmp1 (cost=0.00..16031.31 rows=63 1531 width=89) (1068 rows)
On Thu, Apr 17, 2008 at 11:24 AM, kevin kempter <kevin@kevinkempterllc.com> wrote: > Hi List; > > I have a large tble (playback_device) with 6million rows in it. The > aff_id_tmp1 table has 600,000 rows. > - why am I still getting a seq scan ? > You're selecting almost all the rows in the product of aff_id_tmp1 * playback_fragment. A sequential scan will be far faster than an index scan. You can prove this to yourself using 'set enable_seqscan to false' and running the query again. It should be much slower.
kevin kempter escribió: > Hi List; > > I have a large tble (playback_device) with 6million rows in it. The > aff_id_tmp1 table has 600,000 rows. > > I also have this query: > select distinct > tmp1.affiliate_id, > tmp1.name, > tmp1.description, > tmp1.create_dt, > tmp1.playback_device_id, > pf.segment_id > from > aff_id_tmp1 tmp1, > playback_fragment pf > where > tmp1.playback_device_id = pf.playback_device_id ; > > > The Primary Key for playback_device is the playback_device_id > there is also an index on playback_device_id on the aff_id_tmp1 table. > The only join condition I have is on this key pair (I've posted my > explain plan below) > > > - why am I still getting a seq scan ? > > Thanks in advance. > > > > > > > ============ > Explain PLan > ============ > > explain > select distinct > tmp1.affiliate_id, > tmp1.name, > tmp1.description, > tmp1.create_dt, > tmp1.playback_device_id, > pf.segment_id > from > aff_id_tmp1 tmp1, > playback_fragment pf > where > tmp1.playback_device_id = pf.playback_device_id ; > > > Unique (cost=2966361.56..3194555.91 rows=10104496 width=97) > -> Sort (cost=2966361.56..2998960.76 rows=13039677 width=97) > Sort Key: tmp1.affiliate_id, tmp1.name, tmp1.description, > tmp1.create_dt, > tmp1.playback_device_id, pf.segment_id > -> Hash Join (cost=23925.45..814071.14 rows=13039677 width=97) > Hash Cond: (pf.playback_device_id = > tmp1.playback_device_id) > -> Seq Scan on playback_fragment pf > (cost=0.00..464153.77 rows=130 > 39677 width=16) > -> Hash (cost=16031.31..16031.31 rows=631531 width=89) > -> Seq Scan on aff_id_tmp1 tmp1 > (cost=0.00..16031.31 rows=63 > 1531 width=89) > (1068 rows) > > Cause you are getting all the rows so pgsql need to scan all the table...
Attachment
> - why am I still getting a seq scan ? You'll seq scan tmp1 obviously, and also the other table since you fetch a very large part of it in the process. It's the only way to do this query since there is no WHERE to restrict the number of rows and the DISTINCT applies on columns from both tables. You might want to simplify your query. For instance perhaps you can get pf.segment_id out of the DISTINCT, in which case you can put the distinct in a subquery on tmp1. > > Thanks in advance. > > > > > > > ============ > Explain PLan > ============ > > explain > select distinct > tmp1.affiliate_id, > tmp1.name, > tmp1.description, > tmp1.create_dt, > tmp1.playback_device_id, > pf.segment_id > from > aff_id_tmp1 tmp1, > playback_fragment pf > where > tmp1.playback_device_id = pf.playback_device_id ; > > > Unique (cost=2966361.56..3194555.91 rows=10104496 width=97) > -> Sort (cost=2966361.56..2998960.76 rows=13039677 width=97) > Sort Key: tmp1.affiliate_id, tmp1.name, tmp1.description, > tmp1.create_dt, > tmp1.playback_device_id, pf.segment_id > -> Hash Join (cost=23925.45..814071.14 rows=13039677 > width=97) > Hash Cond: (pf.playback_device_id = > tmp1.playback_device_id) > -> Seq Scan on playback_fragment pf > (cost=0.00..464153.77 rows=130 > 39677 width=16) > -> Hash (cost=16031.31..16031.31 rows=631531 width=89) > -> Seq Scan on aff_id_tmp1 tmp1 > (cost=0.00..16031.31 rows=63 > 1531 width=89) > (1068 rows) > >