Subplan and index usage - Mailing list pgsql-general

From Vyacheslav Kalinin
Subject Subplan and index usage
Date
Msg-id 9b1af80e0803121454y18bbb942u100fabc8b82712e@mail.gmail.com
Whole thread Raw
Responses Re: Subplan and index usage  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-general
Consider the following case which is almost exact snapshot of part of our scheme:
 
                       Table "cities"
    Column    |          Type          | Modifiers | Description
--------------+------------------------+-----------+-------------
 ficity_id    | integer                | not null  |
 ficountry_id | integer                |           |
 firegion_id  | integer                |           |
 fsname       | character varying(100) |           |
 fsname_ru    | character varying(200) |           |
Indexes:
    "pk_geocities" PRIMARY KEY, btree (ficity_id)
    "idx_cities_name" btree (lower(fsname::text) varchar_pattern_ops)
    "idx_cities_name_ru" btree (lower(fsname_ru::text) varchar_pattern_ops)
    "idx_geocities_country_id" btree (ficountry_id)
    "idx_geocities_region_id" btree (firegion_id)
Foreign-key constraints:
    "fk_geocities_country_id" FOREIGN KEY (ficountry_id) REFERENCES countries(ficountry_id) ON UPDATE CASCADE ON DELETE CASCADE
    "fk_geocities_region_id" FOREIGN KEY (firegion_id) REFERENCES regions(firegion_id) ON UPDATE CASCADE ON DELETE CASCADE
 
 
                Table "cities_name_words"
  Column   |          Type          | Modifiers | Description
-----------+------------------------+-----------+-------------
 ficity_id | integer                | not null  |
 fsword    | character varying(200) | not null  |
Indexes:
    "idx_cities_name_words_city_id" btree (ficity_id)
    "idx_cities_name_words_word" btree (fsword varchar_pattern_ops)
Foreign-key constraints:
    "fk_cities_name_words_city_id" FOREIGN KEY (ficity_id) REFERENCES cities(ficity_id) ON UPDATE CASCADE ON DELETE CASCADE
 
 
                Table "cities_name_ru_words"
  Column   |          Type          | Modifiers | Description
-----------+------------------------+-----------+-------------
 ficity_id | integer                | not null  |
 fsword    | character varying(200) | not null  |
Indexes:
    "idx_cities_name_ru_words_city_id" btree (ficity_id)
    "idx_cities_name_ru_words_word" btree (fsword varchar_pattern_ops)
Foreign-key constraints:
    "fk_cities_name_ru_words_city_id" FOREIGN KEY (ficity_id) REFERENCES cities(ficity_id) ON UPDATE CASCADE ON DELETE CASCADE
 
This is the part of geo location database. The purpose of cities_name_words and cities_name_ru_words is to facilitate indexing on separate words in city name - they contain words of fsname or fsname_ru respectively of the corresponding record in cities if it has more than word. Cities has about 190000 records, cities_name_words about 80000 and cities_name_ru_words about 5000. Now the query for city by name looks like this:
 
select *
  from cities
 where  ( ficity_id in (
               select ficity_id from cities_name_words
                where fsword like 'novgorod%'
                union
               select ficity_id from cities_name_ru_words
                where fsword like 'novgorod%'
          )    
          or lower(fsname) like 'novgorod%'
          or lower(fsname_ru) like 'novgorod%'
 )
 
QUERY PLAN
Seq Scan on cities  (cost=16.63..5949.26 rows=95014 width=60)
  Filter: ((hashed subplan) OR (lower((fsname)::text) ~~ 'novgorod%'::text) OR (lower((fsname_ru)::text) ~~ 'novgorod%'::text))
  SubPlan
    ->  Unique  (cost=16.61..16.62 rows=2 width=4)
          ->  Sort  (cost=16.61..16.62 rows=2 width=4)
                Sort Key: cities_name_words.ficity_id
                ->  Append  (cost=0.00..16.60 rows=2 width=4)
                      ->  Index Scan using idx_cities_name_words_word on cities_name_words  (cost=0.00..8.31 rows=1 width=4)
                            Index Cond: (((fsword)::text ~>=~ 'novgorod'::text) AND ((fsword)::text ~<~ 'novgoroe'::text))
                            Filter: ((fsword)::text ~~ 'novgorod%'::text)
                      ->  Index Scan using idx_cities_name_ru_words_word on cities_name_ru_words  (cost=0.00..8.27 rows=1 width=4)
                            Index Cond: (((fsword)::text ~>=~ 'novgorod'::text) AND ((fsword)::text ~<~ 'novgoroe'::text))
                            Filter: ((fsword)::text ~~ 'novgorod%'::text)
 
Notice how it uses proper indexes in subplan and goes for sequence scan on the main table. If the where- conditions are applied separately it uses indexes as expected:
 
select *
  from cities
 where  ( lower(fsname) like 'novgorod%'
          or lower(fsname_ru) like 'novgorod%'
 )
 
QUERY PLAN
Bitmap Heap Scan on cities  (cost=8.57..12.59 rows=1 width=60)
  Recheck Cond: ((lower((fsname)::text) ~~ 'novgorod%'::text) OR (lower((fsname_ru)::text) ~~ 'novgorod%'::text))
  Filter: ((lower((fsname)::text) ~~ 'novgorod%'::text) OR (lower((fsname_ru)::text) ~~ 'novgorod%'::text))
  ->  BitmapOr  (cost=8.57..8.57 rows=1 width=0)
        ->  Bitmap Index Scan on idx_cities_name  (cost=0.00..4.29 rows=1 width=0)
              Index Cond: ((lower((fsname)::text) ~>=~ 'novgorod'::text) AND (lower((fsname)::text) ~<~ 'novgoroe'::text))
        ->  Bitmap Index Scan on idx_cities_name_ru  (cost=0.00..4.28 rows=1 width=0)
              Index Cond: ((lower((fsname_ru)::text) ~>=~ 'novgorod'::text) AND (lower((fsname_ru)::text) ~<~ 'novgoroe'::text))
 
 
select *
  from cities
 where  ( ficity_id in (
               select ficity_id from cities_name_words
                where fsword like 'novgorod%'
                union
               select ficity_id from cities_name_ru_words
                where fsword like 'novgorod%'
          )    
 )
 
QUERY PLAN
Nested Loop  (cost=16.61..33.24 rows=2 width=60)
  ->  Unique  (cost=16.61..16.62 rows=2 width=4)
        ->  Sort  (cost=16.61..16.62 rows=2 width=4)
              Sort Key: cities_name_words.ficity_id
              ->  Append  (cost=0.00..16.60 rows=2 width=4)
                    ->  Index Scan using idx_cities_name_words_word on cities_name_words  (cost=0.00..8.31 rows=1 width=4)
                          Index Cond: (((fsword)::text ~>=~ 'novgorod'::text) AND ((fsword)::text ~<~ 'novgoroe'::text))
                          Filter: ((fsword)::text ~~ 'novgorod%'::text)
                    ->  Index Scan using idx_cities_name_ru_words_word on cities_name_ru_words  (cost=0.00..8.27 rows=1 width=4)
                          Index Cond: (((fsword)::text ~>=~ 'novgorod'::text) AND ((fsword)::text ~<~ 'novgoroe'::text))
                          Filter: ((fsword)::text ~~ 'novgorod%'::text)
  ->  Index Scan using pk_geocities on cities  (cost=0.00..8.28 rows=1 width=60)
        Index Cond: (cities.ficity_id = cities_name_words.ficity_id)
 
So, why does it estimate the row count as 95000 and chooses the seq scan path in the first query (even with enable_seqscan = off)? What can be done to make it use the index?
 
Sincerely,
Viatcheslav

pgsql-general by date:

Previous
From: Tom Lane
Date:
Subject: Re: table size in 8.3
Next
From: gargoyle47
Date:
Subject: Service Account password