Re: Forcing seq_scan off for large table joined with tiny table yeilds improved performance - Mailing list pgsql-performance

From Mario Splivalo
Subject Re: Forcing seq_scan off for large table joined with tiny table yeilds improved performance
Date
Msg-id 49D9F39F.40106@megafon.hr
Whole thread Raw
In response to Re: Forcing seq_scan off for large table joined with tiny table yeilds improved performance  (Scott Marlowe <scott.marlowe@gmail.com>)
Responses Re: Forcing seq_scan off for large table joined with tiny table yeilds improved performance  (Scott Marlowe <scott.marlowe@gmail.com>)
List pgsql-performance
Scott Marlowe wrote:
>
> It's not really solved, it's just a happy coincidence that the current
> plan runs well.  In order to keep the query planner making good
> choices you need to increase stats target for the field in the index
> above.  The easiest way to do so is to do this:
>
> alter database mydb set default_statistics_target=100;
>
> and run analyze again:
>
> analyze;

So, i removed the index on field_name, set
default_default_statistics_target to 100, analyzed, and the results are
the same:

                                                              QUERY PLAN


------------------------------------------------------------------------------------------------------------------------------------
  Hash Join  (cost=1.04..58676.31 rows=218048 width=68) (actual
time=0.067..12268.394 rows=6 loops=1)
    Hash Cond: ((u.field_name)::text = (t.key)::text)
    ->  Seq Scan on photo_info_data u  (cost=0.00..47500.30 rows=2398530
width=50) (actual time=0.013..6426.611 rows=2398446 loops=1)
    ->  Hash  (cost=1.02..1.02 rows=2 width=18) (actual
time=0.015..0.015 rows=2 loops=1)
          ->  Seq Scan on t_query_data t  (cost=0.00..1.02 rows=2
width=18) (actual time=0.002..0.006 rows=2 loops=1)
  Total runtime: 12268.459 ms
(6 rows)

I even changed default_statistics_target to 1000:


------------------------------------------------------------------------------------------------------------------------------------
  Hash Join  (cost=1.04..58580.29 rows=208561 width=67) (actual
time=0.054..12434.283 rows=6 loops=1)
    Hash Cond: ((u.field_name)::text = (t.key)::text)
    ->  Seq Scan on photo_info_data u  (cost=0.00..47499.46 rows=2398446
width=49) (actual time=0.012..6129.923 rows=2398446 loops=1)
    ->  Hash  (cost=1.02..1.02 rows=2 width=18) (actual
time=0.015..0.015 rows=2 loops=1)
          ->  Seq Scan on t_query_data t  (cost=0.00..1.02 rows=2
width=18) (actual time=0.002..0.004 rows=2 loops=1)
  Total runtime: 12434.338 ms
(6 rows)


Even when I run this query, I get sequential scan:

explain analyze select * from photo_info_data where field_name =
'f-spot' or field_name = 'shutter';

                                                     QUERY PLAN

-------------------------------------------------------------------------------------------------------------------
  Seq Scan on photo_info_data  (cost=0.00..59491.69 rows=1705 width=49)
(actual time=0.018..1535.963 rows=6 loops=1)
    Filter: (((field_name)::text = 'f-spot'::text) OR
((field_name)::text = 'shutter'::text))
  Total runtime: 1536.010 ms
(3 rows)

These are the representations of te values 'f-spot' and 'shutter' for
the field field_name in photo_info_data table:

xmltest=# select field_name, count(*) from user_info_data where
field_name in ('visina', 'spol') group by field_name;
  field_name | count
------------+-------
  'f-spot'   |     3
  'shutter'  |     3
(2 rows)


Maybe my test-data is poor? As I've mentioned, photo_info_data has
little over 2300000 rows. And this is complete 'distribution' of the data:

xmltest=# select field_name, count(*) from user_info_data group by
field_name order by count(*) desc;
    field_name   | count
----------------+--------
  field_Xx1      | 350000
  field_Xx2      | 332447
  field_Xx3      | 297414
  field_Xx4      | 262394
  field_Xx5      | 227396
  field_Xx6      | 192547
  field_Xx7      | 157612
  field_Xx8      | 122543
  field_Xx9      |  87442
  field_Xx10     |  52296
  field_1        |  50000
  field_2        |  47389
  field_3        |  42412
  field_4        |  37390
  field_5        |  32366
  field_6        |  27238
  field_7        |  22360
  field_Xx11     |  17589
  field_8        |  17412
  field_9        |  12383
  field_10       |   7386
  field_11       |   2410
  f-spot         |      3
  shutter        |      3
  focal          |      3
  flash          |      3
  m_city         |      3
  person         |      3
  iso            |      2
(29 rows)

No matter what field_name value I enter in WHERE condition, planner
chooses sequential scan. Only when I add seperate index on field_name,
planner chooes index scan or bitmap index scan.

    Mike

pgsql-performance by date:

Previous
From: Matthew Wakeling
Date:
Subject: Re: plpgsql arrays
Next
From: Matthew Wakeling
Date:
Subject: Re: difficulties with time based queries