Forcing seq_scan off for large table joined with tiny table yeilds improved performance

From: Mario Splivalo
Subject: Forcing seq_scan off for large table joined with tiny table yeilds improved performance
Date: ,
Msg-id: 49D0D237.7020308@megafon.hr
(view: Whole thread, Raw)
Responses: Re: Forcing seq_scan off for large table joined with tiny table yeilds improved performance  (Tom Lane)
List: pgsql-performance

Tree view

Forcing seq_scan off for large table joined with tiny table yeilds improved performance  (Mario Splivalo, )
 Re: Forcing seq_scan off for large table joined with tiny table yeilds improved performance  (Tom Lane, )
  Re: Forcing seq_scan off for large table joined with tiny table yeilds improved performance  (Mario Splivalo, )
   Re: Forcing seq_scan off for large table joined with tiny table yeilds improved performance  (Scott Marlowe, )
    Re: Forcing seq_scan off for large table joined with tiny table yeilds improved performance  (Mario Splivalo, )
     Re: Forcing seq_scan off for large table joined with tiny table yeilds improved performance  (Scott Marlowe, )
      Re: Forcing seq_scan off for large table joined with tiny table yeilds improved performance  (Mario Splivalo, )
       Re: Forcing seq_scan off for large table joined with tiny table yeilds improved performance  (Scott Marlowe, )
        Re: Forcing seq_scan off for large table joined with tiny table yeilds improved performance  (Mario Splivalo, )
         Re: Forcing seq_scan off for large table joined with tiny table yeilds improved performance  (Scott Marlowe, )

I have two tables, like this:

Big table:

CREATE TABLE photo_info_data
(
   photo_id integer NOT NULL,
   field_name character varying NOT NULL,
   field_value character varying,
   CONSTRAINT photo_info_data_pk PRIMARY KEY (photo_id, field_name)
)
WITH (OIDS=FALSE);

CREATE INDEX user_info_data_ix_field_value
   ON user_info_data
   USING btree
   (field_value);


Small table:

CREATE TABLE t_query_data
(
   i integer,
   "key" character varying,
   op character varying,
   "value" character varying
)
WITH (OIDS=FALSE);

I have around 2400000 rows in photo_info_data, and just two rows in
t_query_data:
  i |  key     | op | value
---+----------+----+--------
  1 | f-stop   | eq | 2.6
  2 | shutter  | gt | 1/100


This is the query I'm executing:

SELECT
    *
FROM
    photo_info_data u
    JOIN t_query_data t on u.field_name = key

This query takes around 900ms to execute. It returns 6 rows.

When I do 'explain analyze' for some reason it takes around 7 seconds,
and this is what I get:

phototest=# explain analyze select * from photo_info_data u join
t_query_data t on u.field_name = key;
                                                              QUERY PLAN


------------------------------------------------------------------------------------------------------------------------------------
  Hash Join  (cost=1.04..58676.31 rows=218048 width=68) (actual
time=2381.895..7087.225 rows=6 loops=1)
    Hash Cond: ((u.field_name)::text = (t.key)::text)
    ->  Seq Scan on photo_info_data u  (cost=0.00..47500.30 rows=2398530
width=50) (actual time=0.042..3454.112 rows=2398446 loops=1)
    ->  Hash  (cost=1.02..1.02 rows=2 width=18) (actual
time=0.016..0.016 rows=2 loops=1)
          ->  Seq Scan on t_query_data t  (cost=0.00..1.02 rows=2
width=18) (actual time=0.003..0.007 rows=2 loops=1)
  Total runtime: 7087.291 ms
(6 rows)

Time: 7088.663 ms


I can rerun this query many times, it's always around 7 seconds. I/O
wait during the query is nonexistant, it just takes 100% of CPU time (i
have a DualCore Opteron server).

If I force the planner not to use sequential_scan, here is what I get:

phototest=# explain analyze select * from photo_info_data u join
t_query_data t on u.field_name = key;
                                                                  QUERY
PLAN

---------------------------------------------------------------------------------------------------------------------------------------------
  Nested Loop  (cost=100039134.84..100130206.79 rows=218048 width=68)
(actual time=271.138..540.998 rows=6 loops=1)
    ->  Seq Scan on t_query_data t  (cost=100000000.00..100000001.02
rows=2 width=18) (actual time=0.008..0.015 rows=2 loops=1)
    ->  Bitmap Heap Scan on photo_info_data u  (cost=39134.84..63740.08
rows=109024 width=50) (actual time=270.464..270.469 rows=3 loops=2)
          Recheck Cond: ((u.field_name)::text = (t.key)::text)
          ->  Bitmap Index Scan on photo_info_data_pk
(cost=0.00..39107.59 rows=109024 width=0) (actual time=270.435..270.435
rows=3 loops=2)
                Index Cond: ((u.field_name)::text = (t.key)::text)
  Total runtime: 541.065 ms
(7 rows)

Time: 542.147 ms


The database currently has only those two tables. I have vacuumed them
prior running above queries.

I tought this information also might be important:
phototest=# select key, count(*) from photo_info_data u join
t_query_data t on u.field_name = key group by key;
   key     | count
----------+-------
  f-stop   |     3
  shutter  |     3
(2 rows)


Am I doing something wrong here? The photo_info_data would hold around
10.000.000 records, should I be doing 'set seq_scan to false' each time
I will want to run this query? (Since I'm accessing postgres trough JDBC
I'll have same situation I had weeks ago, I described it here also).

    Mike


pgsql-performance by date:

From: dan@sidhe.org
Date:
Subject: Re: Trying to track down weird query stalls
From: Robert Haas
Date:
Subject: Re: Trying to track down weird query stalls