Thread: Bad plan using join on VALUES

Bad plan using join on VALUES

From
Listmail
Date:
    Table definition and problem query is below. I'm surprised...

caillaudangers=> \d relations
                           Table « public.relations »
   Colonne   |  Type   |                     Modificateurs
------------+---------+--------------------------------------------------------
  parent_id  | integer | not null
  child_id   | integer | not null
  klass      | integer | not null
  id         | integer | not null default
nextval('relations_id_seq'::regclass)
  sort_order | integer |
  data       | bytea   |
  tree_vis   | boolean |
  main_path  | boolean |
  index_id   | integer |
Index :
     « relations_pkey » PRIMARY KEY, btree (id)
     « relations_unique » UNIQUE, btree (parent_id, child_id)
     « relations_child » btree (child_id)
     « relations_sort » btree (parent_id, klass, sort_order) CLUSTER
     « relations_tree » btree (parent_id, klass, sort_order) WHERE tree_vis
= true
Contraintes de clés étrangères :
     « klass_fk » FOREIGN KEY (klass) REFERENCES relation_klasses(id)
     « relations_child_id_fkey » FOREIGN KEY (child_id) REFERENCES
nodes(id) ON DELETE CASCADE DEFERRABLE INITIALLY DEFERRED
     « relations_node_id_fkey » FOREIGN KEY (parent_id) REFERENCES
nodes(id) ON DELETE CASCADE DEFERRABLE INITIALLY DEFERRED

caillaudangers=> CLUSTER relations; ANALYZE relations;
CLUSTER
ANALYZE


caillaudangers=> EXPLAIN ANALYZE SELECT * FROM relations WHERE child_id IN
( [60 integers] );

                                                                      QUERY   
PLAN

-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
  Bitmap Heap Scan on relations  (cost=176.93..383.43 rows=236 width=58)
(actual time=0.298..0.482 rows=350 loops=1)
    Recheck Cond: (child_id = ANY ('  [60 integers]
    ->  Bitmap Index Scan on relations_child  (cost=0.00..176.87 rows=236
width=0) (actual time=0.281..0.281 rows=350 loops=1)
          Index Cond: (child_id = ANY  [60 integers]
  Total runtime: 0.582 ms
(5 lignes)

    OK, Screaming fast ! (and it doesn't use the CLUSTER)

caillaudangers=> explain analyze SELECT target.* FROM relations AS target,
(VALUES (8695), (8743), (10309), (22000), (22980), (23016), (8683),
(25092), (13369), (13377), (13375), (13371), (13373), (25126), (10629),
(13363), (13365), (22820), (15705), (13367), (8759), (8783), (8815),
(23018), (8781), (8765), (23012), (23036), (23046), (8803), (25226),
(22940), (8771), (8769), (13335), (23004), (22816), (23062), (8805),
(8727), (13379), (23064), (23032), (24976), (8807), (23024), (8787),
(23028), (8809), (8735), (23042), (8813), (8731), (22964), (8755), (8697),
(8749), (22974), (8733), (8715)) AS source WHERE target.child_id =
source.column1;
                                                         QUERY PLAN

---------------------------------------------------------------------------------------------------------------------------
  Hash Join  (cost=1.50..542.68 rows=216 width=58) (actual
time=0.395..45.402 rows=350 loops=1)
    Hash Cond: (target.child_id = "*VALUES*".column1)
    ->  Seq Scan on relations target  (cost=0.00..440.29 rows=26329
width=58) (actual time=0.011..8.213 rows=26329 loops=1)
    ->  Hash  (cost=0.75..0.75 rows=60 width=4) (actual time=0.096..0.096
rows=60 loops=1)
          ->  Values Scan on "*VALUES*"  (cost=0.00..0.75 rows=60 width=4)
(actual time=0.001..0.049 rows=60 loops=1)
  Total runtime: 45.594 ms

    Argh. Where did my index go ? [shoots self in foot]




Re: Bad plan using join on VALUES

From
Martijn van Oosterhout
Date:
On Tue, Apr 10, 2007 at 10:57:43AM +0200, Listmail wrote:
>
>     Table definition and problem query is below. I'm surprised...

Well, the planner probably guessed that in your case it's faster to
scan the table than to use the index (indexes are not free). Did it
choose wrong? If you disable the seqscan, does it get faster (set
enable_seqscan=false). Make sure you run both a few times to make sure
you're getting good results.

If it turns out the planner is wrong, you need to do some tuning, in
particular random_page_cost and effective_cache_size.
>                                                         QUERY PLAN
>
---------------------------------------------------------------------------------------------------------------------------
>  Hash Join  (cost=1.50..542.68 rows=216 width=58) (actual
> time=0.395..45.402 rows=350 loops=1)
>    Hash Cond: (target.child_id = "*VALUES*".column1)
>    ->  Seq Scan on relations target  (cost=0.00..440.29 rows=26329
> width=58) (actual time=0.011..8.213 rows=26329 loops=1)
>    ->  Hash  (cost=0.75..0.75 rows=60 width=4) (actual time=0.096..0.096
> rows=60 loops=1)
>          ->  Values Scan on "*VALUES*"  (cost=0.00..0.75 rows=60 width=4)
> (actual time=0.001..0.049 rows=60 loops=1)
>  Total runtime: 45.594 ms

Hope this helps,
--
Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> From each according to his ability. To each according to his ability to litigate.

Attachment

Re: Bad plan using join on VALUES (and now on temp table too)

From
Listmail
Date:
> Well, the planner probably guessed that in your case it's faster to
> scan the table than to use the index (indexes are not free). Did it
> choose wrong?

    Yes, see the other query in my post...

    id IN ( 60 values ) => 0.582 ms (bitmap scan on the index : perfect)
    join with VALUES => 45.594 ms (seq scan)

> If you disable the seqscan, does it get faster (set
> enable_seqscan=false). Make sure you run both a few times to make sure
> you're getting good results.

    Yeah, everything is in cache.
    With enable_seq to 0, it does a nested loop at 1.190 ms

> If it turns out the planner is wrong, you need to do some tuning, in
> particular random_page_cost and effective_cache_size.

    I don't want to screw my tuning (which works well for all other queries)
because of this single one !
    I modified the website to use IN(), but I thought it might be a bug in
the planner...

    I'll repost.
    I believe the planner does not consider the right plan. (see more below)

EXPLAIN ANALYZE SELECT * FROM relations WHERE child_id
IN(
8695,8743,10309,22000,22980,23016,8683,25092,13369,13377,13375,13371,13373,25126,10629,13363,13365,22820,15705,13367,8759,8783,8815,23018,8781,8765,23012,23036,23046,8803,25226,22940,8771,8769,13335,23004,22816,23062,8805,8727,13379,23064,23032,24976,8807,23024,8787,23028,8809,8735,23042,8813,8731,22964,8755,8697,8749,22974,8733,8715
 
) ;

                                                                      QUERY   
PLAN

-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
  Bitmap Heap Scan on relations  (cost=176.93..383.43 rows=236 width=58)
(actual time=0.295..0.470 rows=350 loops=1)
    Recheck Cond: (child_id = ANY

('{8695,8743,10309,22000,22980,23016,8683,25092,13369,13377,13375,13371,13373,25126,10629,13363,13365,22820,15705,13367,8759,8783,8815,23018,8781,8765,23012,23036,23046,8803,25226,22940,8771,8769,13335,23004,22816,23062,8805,8727,13379,23064,23032,24976,8807,23024,8787,23028,8809,8735,23042,8813,8731,22964,8755,8697,8749,22974,8733,8715}'::integer[]))
    ->  Bitmap Index Scan on relations_child  (cost=0.00..176.87 rows=236
width=0) (actual time=0.279..0.279 rows=350 loops=1)
          Index Cond: (child_id = ANY

('{8695,8743,10309,22000,22980,23016,8683,25092,13369,13377,13375,13371,13373,25126,10629,13363,13365,22820,15705,13367,8759,8783,8815,23018,8781,8765,23012,23036,23046,8803,25226,22940,8771,8769,13335,23004,22816,23062,8805,8727,13379,23064,23032,24976,8807,23024,8787,23028,8809,8735,23042,8813,8731,22964,8755,8697,8749,22974,8733,8715}'::integer[]))
  Total runtime: 0.571 ms

explain analyze SELECT target.* FROM relations AS target, (VALUES
(8695::INTEGER), (8743), (10309), (22000), (22980), (23016), (8683),
(25092), (13369), (13377), (13375), (13371), (13373), (25126), (10629),
(13363), (13365), (22820), (15705), (13367), (8759), (8783), (8815),
(23018), (8781), (8765), (23012), (23036), (23046), (8803), (25226),
(22940), (8771), (8769), (13335), (23004), (22816), (23062), (8805),
(8727), (13379), (23064), (23032), (24976), (8807), (23024), (8787),
(23028), (8809), (8735), (23042), (8813), (8731), (22964), (8755), (8697),
(8749), (22974), (8733), (8715)) AS source WHERE target.child_id =
source.column1;
                                                         QUERY PLAN

---------------------------------------------------------------------------------------------------------------------------
  Hash Join  (cost=1.50..542.68 rows=216 width=58) (actual
time=0.349..44.907 rows=350 loops=1)
    Hash Cond: (target.child_id = "*VALUES*".column1)
    ->  Seq Scan on relations target  (cost=0.00..440.29 rows=26329
width=58) (actual time=0.011..8.250 rows=26334 loops=1)
    ->  Hash  (cost=0.75..0.75 rows=60 width=4) (actual time=0.076..0.076
rows=60 loops=1)
          ->  Values Scan on "*VALUES*"  (cost=0.00..0.75 rows=60 width=4)
(actual time=0.001..0.035 rows=60 loops=1)
  Total runtime: 45.048 ms

SET enable_seqscan TO 0;
SET
caillaudangers=> explain analyze SELECT target.* FROM relations AS target,
(VALUES (8695::INTEGER), (8743), (10309), (22000), (22980), (23016),
(8683), (25092), (13369), (13377), (13375), (13371), (13373), (25126),
(10629), (13363), (13365), (22820), (15705), (13367), (8759), (8783),
(8815), (23018), (8781), (8765), (23012), (23036), (23046), (8803),
(25226), (22940), (8771), (8769), (13335), (23004), (22816), (23062),
(8805), (8727), (13379), (23064), (23032), (24976), (8807), (23024),
(8787), (23028), (8809), (8735), (23042), (8813), (8731), (22964), (8755),
(8697), (8749), (22974), (8733), (8715)) AS source WHERE target.child_id =
source.column1;
                                                                 QUERY PLAN

-------------------------------------------------------------------------------------------------------------------------------------------
  Nested Loop  (cost=0.00..756.21 rows=216 width=58) (actual
time=0.023..1.151 rows=350 loops=1)
    ->  Values Scan on "*VALUES*"  (cost=0.00..0.75 rows=60 width=4)
(actual time=0.002..0.041 rows=60 loops=1)
    ->  Index Scan using relations_child on relations target
(cost=0.00..12.54 rows=4 width=58) (actual time=0.005..0.013 rows=6
loops=60)
          Index Cond: (target.child_id = "*VALUES*".column1)
  Total runtime: 1.265 ms


    ------------------------------

    I stuffed the 60 integer values into a TEMP TABLE instead of using VALUES
or IN, and the plans are also very wrong :

caillaudangers=> INSERT INTO tmp VALUES (8695::INTEGER), (8743), (10309),
(22000), (22980), (23016), (8683), (25092), (13369), (13377), (13375),
(13371), (13373), (25126), (10629), (13363), (13365), (22820), (15705),
(13367), (8759), (8783), (8815), (23018), (8781), (8765), (23012),
(23036), (23046), (8803), (25226), (22940), (8771), (8769), (13335),
(23004), (22816), (23062), (8805), (8727), (13379), (23064), (23032),
(24976), (8807), (23024), (8787), (23028), (8809), (8735), (23042),
(8813), (8731), (22964), (8755), (8697), (8749), (22974), (8733), (8715);
INSERT 0 60
caillaudangers=> ANALYZE relations;
ANALYZE
caillaudangers=> ANALYZE tmp;
ANALYZE
caillaudangers=> EXPLAIN ANALYZE SELECT * FROM relations WHERE child_id IN
(SELECT id FROM tmp);
                                                      QUERY PLAN
---------------------------------------------------------------------------------------------------------------------
  Hash IN Join  (cost=2.35..543.39 rows=195 width=58) (actual
time=0.367..36.242 rows=350 loops=1)
    Hash Cond: (relations.child_id = tmp.id)
    ->  Seq Scan on relations  (cost=0.00..440.34 rows=26334 width=58)
(actual time=0.009..15.604 rows=26334 loops=1)
    ->  Hash  (cost=1.60..1.60 rows=60 width=4) (actual time=0.065..0.065
rows=60 loops=1)
          ->  Seq Scan on tmp  (cost=0.00..1.60 rows=60 width=4) (actual
time=0.004..0.024 rows=60 loops=1)
  Total runtime: 36.396 ms
(6 lignes)

caillaudangers=> EXPLAIN ANALYZE SELECT r.* FROM relations r, tmp t WHERE
r.child_id=t.id;
                                                       QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------
  Hash Join  (cost=2.35..543.39 rows=195 width=58) (actual
time=0.403..33.120 rows=350 loops=1)
    Hash Cond: (r.child_id = t.id)
    ->  Seq Scan on relations r  (cost=0.00..440.34 rows=26334 width=58)
(actual time=0.011..14.987 rows=26334 loops=1)
    ->  Hash  (cost=1.60..1.60 rows=60 width=4) (actual time=0.061..0.061
rows=60 loops=1)
          ->  Seq Scan on tmp t  (cost=0.00..1.60 rows=60 width=4) (actual
time=0.004..0.024 rows=60 loops=1)
  Total runtime: 33.266 ms