query takes 65 times longer if I add 1 column (explain attached) - Mailing list pgsql-general

From Dan Weber
Subject query takes 65 times longer if I add 1 column (explain attached)
Date
Msg-id 5dfa06e10611020912m1ff63bc3j516b7488914978cb@mail.gmail.com
Whole thread Raw
Responses Re: query takes 65 times longer if I add 1 column (explain attached)  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-general

I've got a query that takes a very long time to execute if I add an
additional column to the SELECT clause.

Here is Query A, the slow one.  I've numbered the rows
of the EXPLAIN for convenience.

psql=# explain analyze SELECT timestamp, src_ip, dst_ip, id,
  string_one, string_two FROM table_one LEFT JOIN table_two USING(id)
  WHERE TRUE AND src_ip = '192.168.10.128' ORDER BY timestamp DESC
  LIMIT 10;


                 QUERY PLAN

-------------------------------------------------------------------

a1. Limit (cost=18954.93..18954.95 rows=10 width=136)
    (actual time=169038.823..169038.851 rows=10 loops=1)

a2.   ->  Sort  (cost= 18954.93..18959.75 rows=1929 width=136)
      (actual time=169038.816..169038.825 rows=10 loops=1)

a3.         Sort Key: table_one."timestamp"

a4.         ->  Nested Loop Left Join
        (cost= 0.00..18849.66 rows=1929 width=136)
        (actual time=862.661..169036.636 rows=179 loops=1)

a5.               Join Filter: ("outer".id = "inner".id)

a6.               ->  Index Scan using table_one_type_key on
              table_one  (cost=0.00..17.70 rows=3 width=72)
              (actual time=0.083..864.386 rows=179 loops=1)

a7.                     Index Cond: (src_ip = '192.168.10.128 '::inet)

a8.               ->  Seq Scan on table_two  (cost=0.00..4669.92
              rows=128592 width=68)
              (actual time=842.166..933.496 rows=4202 loops=179)

 Total runtime: 169039.368 ms
(9 rows)

Time: 169041.495 ms

It takes almost 3 minutes to run.  If I knock off either string_one or
string_two, the query runs in <3 seconds.  Here is Query B:

psql=# explain analyze SELECT timestamp, src_ip, dst_ip, id,
  string_one FROM table_one LEFT JOIN table_two USING(id) WHERE TRUE
  AND src_ip = '192.168.10.128' ORDER BY timestamp DESC LIMIT 10;

                 QUERY PLAN

-------------------------------------------------------------------

b1. Limit  (cost=17617.45..17617.47 rows=10 width=104
    (actual time=2644.975..2645.003 rows=10 loops=1)

b2.   ->  Sort  (cost=17617.45..17622.27 rows=1929 width=104)
      (actual time=2644.968..2644.978 rows=10 loops=1)

b3.         Sort Key: table_one."timestamp"

b4.         ->  Nested Loop Left Join  (cost=5810.75..17512.19
        rows=1929 width=104)
        (actual time=956.129..2643.458 rows=179 loops=1)

b5.               Join Filter: ("outer".id = "inner".id)

b6.               ->  Bitmap Heap Scan on table_one
              (cost=7.24..13.72 rows=3 width=72)
              (actual time=39.745..67.818 rows=179 loops=1)

b7.                     Recheck Cond: (src_ip = '192.168.10.128'::inet)

b8.                     ->  Bitmap Index Scan on table_one_type_key 
                (cost=0.00..7.24 rows=3 width=0)
                (actual time=29.399..29.399 rows=179 loops=1)

b9.                           Index Cond: (src_ip = '192.168.10.128 '::inet)

b10.              ->  Materialize  (cost=5803.51..8094.43
              rows=128592 width=36)
              (actual time=4.573..9.462 rows=4202 loops=179)

b11.                    ->  Seq Scan on table_two 
                (cost=0.00..4669.92 rows=128592 width=36)
                (actual time=818.216..895.127
                 rows=4202 loops=1)

 Total runtime: 2646.899 ms
(12 rows)


If we compare each EXPLAIN, they're about the same down through line
5, the Join Filter.  But Query A joins things from an "Index Scan" and
a "Seq Scan," while Query B joins things from a "Bitmap Index Scan"
and a "Materialize."

1. Why is the additional column causing such a big change in the
   searching technique, given that the column is not involved in
   the WHERE at all?

2. Besides indices, how can I coerce Postgres to act like Query
   B instead of like Query A?


pgsql-general by date:

Previous
From: brian
Date:
Subject: Re: Weird double single quote issue
Next
From: Tom Lane
Date:
Subject: Re: query takes 65 times longer if I add 1 column (explain attached)