A "set seqscan='f'" query improvement example - Mailing list pgsql-general

From Jon Lapham
Subject A "set seqscan='f'" query improvement example
Date
Msg-id 3CFE2EE2.3080709@extracta.com.br
Whole thread Raw
List pgsql-general
Hello-

I'm not sure this is a problem, but I thought I woud report this anyway.
  If I "set seqscan='f'" on the following query, I see a HUGE
improvement in query time (factor of 6).  Maybe this example will be
helpful to someone working on the planner.

Note that tmp_platedataset is a temporary table.  Everything has been
VACUUM ANALYZE'd before issuing any queries, including the temp table.

The 3 table join query:
SELECT pp.sampleid
FROM   tmp_platedataset tpd, plate_dataset_result pdr, plate_pos pp
WHERE  pdr.pp_plateid=pp.plateid
   AND  pdr.col=pp.col
   AND  pdr.row=pp.row
   AND  pdr.platedatasetid=tpd.platedatasetid
   AND  tpd.toview='t';

The tables:
It seems that "\d" doesn't work with temp tables... hmm, anyway,
tmp_platedataset contains 8 columns, but the important one is
"platedatasetid int4".  There are no triggers, rules, associated with
this table.
main_v0_8=# select count(*) from tmp_platedataset;
  count
-------
    401
(1 row)

main_v0_8=# \d plate_dataset_result
        Table "plate_dataset_result"
      Column     |     Type     | Modifiers
----------------+--------------+-----------
  platedatasetid | integer      | not null
  row            | character(1) |
  col            | smallint     |
  result         | real         |
  pp_plateid     | integer      | not null
Indexes: pdsr_pp_plateid
Unique keys: plate_dataset_platedataseti_key
Triggers: RI_ConstraintTrigger_26388004,
           RI_ConstraintTrigger_23833439
main_v0_8=# select count(*) from plate_dataset_result;
   count
---------
  2192544
(1 row)

main_v0_8=# \d plate_pos
                 Table "plate_pos"
    Column    |          Type          | Modifiers
-------------+------------------------+-----------
  plateid     | integer                | not null
  sampleid    | integer                |
  description | character varying(255) |
  row         | character(1)           |
  col         | smallint               |
  conc        | real                   |
Indexes: plate_pos_sampleid
Unique keys: plate_pos_plateid_key
Triggers: RI_ConstraintTrigger_23833631,
           RI_ConstraintTrigger_23833625
main_v0_8=# select count(*) from plate_pos;
  count
-------
  46393
(1 row)

Here is the long query.

main_v0_8=# set enable_seqscan='t';
SET VARIABLE
main_v0_8=# explain ANALYZE SELECT pp.sampleid FROM tmp_platedataset
tpd, plate_dataset_result pdr, plate_pos pp  WHERE
pdr.pp_plateid=pp.plateid AND pdr.col=pp.col AND pdr.row=pp.row AND
pdr.platedatasetid=tpd.platedatasetid AND tpd.toview='t';
NOTICE:  QUERY PLAN:

Merge Join  (cost=48222.32..50949.26 rows=30380 width=34) (actual
time=18350.80..19147.17 rows=32078 loops=1)
   ->  Index Scan using plate_pos_plateid_key on plate_pos pp
(cost=0.00..1989.92 rows=46393 width=15) (actual time=0.05..195.70
rows=19342 loops=1)
   ->  Sort  (cost=48222.32..48222.32 rows=35759 width=19) (actual
time=18350.60..18396.26 rows=38496 loops=1)
         ->  Hash Join  (cost=22.02..45517.89 rows=35759 width=19)
(actual time=10184.60..17237.49 rows=38496 loops=1)
               ->  Seq Scan on plate_dataset_result pdr
(cost=0.00..35073.93 rows=1994993 width=15) (actual time=0.07..12277.11
rows=2192544 loops=1)
               ->  Hash  (cost=21.01..21.01 rows=401 width=4) (actual
time=3.57..3.57 rows=0 loops=1)
                     ->  Seq Scan on tmp_platedataset tpd
(cost=0.00..21.01 rows=401 width=4) (actual time=0.41..2.60 rows=401
loops=1)
Total runtime: 19199.12 msec




Now, if I run this with heavy seqscan penalties, I can use "EXPLAIN
ANALYZE":

main_v0_8=# set enable_seqscan='f';
SET VARIABLE
main_v0_8=# explain ANALYZE SELECT pp.sampleid FROM tmp_platedataset
tpd, plate_dataset_result pdr, plate_pos pp  WHERE
pdr.pp_plateid=pp.plateid AND pdr.col=pp.col AND pdr.row=pp.row AND
pdr.platedatasetid=tpd.platedatasetid AND tpd.toview='t';
NOTICE:  QUERY PLAN:

Merge Join  (cost=100134246.14..100136973.08 rows=30380 width=34)
(actual time=1697.54..3011.73 rows=32078 loops=1)
   ->  Sort  (cost=100134246.14..100134246.14 rows=35759 width=19)
(actual time=1697.30..1745.48 rows=38496 loops=1)
         ->  Nested Loop  (cost=100000000.00..100131541.72 rows=35759
width=19) (actual time=0.44..619.23 rows=38496 loops=1)
               ->  Seq Scan on tmp_platedataset tpd
(cost=100000000.00..100000021.01 rows=401 width=4) (actual
time=0.37..4.05 rows=401 loops=1)
               ->  Index Scan using plate_dataset_platedataseti_key on
plate_dataset_result pdr  (cost=0.00..326.87 rows=89 width=15) (actual
time=0.03..1.01 rows=96 loops=401)
   ->  Index Scan using plate_pos_plateid_key on plate_pos pp
(cost=0.00..1989.92 rows=46393 width=15) (actual time=0.08..429.31
rows=43181 loops=1)
Total runtime: 3069.29 msec

-Jon

PS: If I can provide or do anything else that may be useful to anyone,
let me know!

--

-**-*-*---*-*---*-*---*-----*-*-----*---*-*---*-----*-----*-*-----*---
  Jon Lapham
  Extracta Moléculas Naturais, Rio de Janeiro, Brasil
  email: lapham@extracta.com.br      web: http://www.extracta.com.br/
***-*--*----*-------*------------*--------------------*---------------


pgsql-general by date:

Previous
From: tony
Date:
Subject: need magic formula for euro conversion
Next
From: Andrew Sullivan
Date:
Subject: Re: Re^3 : Solaris Performance - 64 bit puzzle