Thread: Subplan and index usage
Consider the following case which is almost exact snapshot of part of our scheme:
Table "cities"
Column | Type | Modifiers | Description
--------------+------------------------+-----------+-------------
ficity_id | integer | not null |
ficountry_id | integer | |
firegion_id | integer | |
fsname | character varying(100) | |
fsname_ru | character varying(200) | |
Indexes:
"pk_geocities" PRIMARY KEY, btree (ficity_id)
"idx_cities_name" btree (lower(fsname::text) varchar_pattern_ops)
"idx_cities_name_ru" btree (lower(fsname_ru::text) varchar_pattern_ops)
"idx_geocities_country_id" btree (ficountry_id)
"idx_geocities_region_id" btree (firegion_id)
Foreign-key constraints:
"fk_geocities_country_id" FOREIGN KEY (ficountry_id) REFERENCES countries(ficountry_id) ON UPDATE CASCADE ON DELETE CASCADE
"fk_geocities_region_id" FOREIGN KEY (firegion_id) REFERENCES regions(firegion_id) ON UPDATE CASCADE ON DELETE CASCADE
Column | Type | Modifiers | Description
--------------+------------------------+-----------+-------------
ficity_id | integer | not null |
ficountry_id | integer | |
firegion_id | integer | |
fsname | character varying(100) | |
fsname_ru | character varying(200) | |
Indexes:
"pk_geocities" PRIMARY KEY, btree (ficity_id)
"idx_cities_name" btree (lower(fsname::text) varchar_pattern_ops)
"idx_cities_name_ru" btree (lower(fsname_ru::text) varchar_pattern_ops)
"idx_geocities_country_id" btree (ficountry_id)
"idx_geocities_region_id" btree (firegion_id)
Foreign-key constraints:
"fk_geocities_country_id" FOREIGN KEY (ficountry_id) REFERENCES countries(ficountry_id) ON UPDATE CASCADE ON DELETE CASCADE
"fk_geocities_region_id" FOREIGN KEY (firegion_id) REFERENCES regions(firegion_id) ON UPDATE CASCADE ON DELETE CASCADE
Table "cities_name_words"
Column | Type | Modifiers | Description
-----------+------------------------+-----------+-------------
ficity_id | integer | not null |
fsword | character varying(200) | not null |
Indexes:
"idx_cities_name_words_city_id" btree (ficity_id)
"idx_cities_name_words_word" btree (fsword varchar_pattern_ops)
Foreign-key constraints:
"fk_cities_name_words_city_id" FOREIGN KEY (ficity_id) REFERENCES cities(ficity_id) ON UPDATE CASCADE ON DELETE CASCADE
Column | Type | Modifiers | Description
-----------+------------------------+-----------+-------------
ficity_id | integer | not null |
fsword | character varying(200) | not null |
Indexes:
"idx_cities_name_words_city_id" btree (ficity_id)
"idx_cities_name_words_word" btree (fsword varchar_pattern_ops)
Foreign-key constraints:
"fk_cities_name_words_city_id" FOREIGN KEY (ficity_id) REFERENCES cities(ficity_id) ON UPDATE CASCADE ON DELETE CASCADE
Table "cities_name_ru_words"
Column | Type | Modifiers | Description
-----------+------------------------+-----------+-------------
ficity_id | integer | not null |
fsword | character varying(200) | not null |
Indexes:
"idx_cities_name_ru_words_city_id" btree (ficity_id)
"idx_cities_name_ru_words_word" btree (fsword varchar_pattern_ops)
Foreign-key constraints:
"fk_cities_name_ru_words_city_id" FOREIGN KEY (ficity_id) REFERENCES cities(ficity_id) ON UPDATE CASCADE ON DELETE CASCADE
Column | Type | Modifiers | Description
-----------+------------------------+-----------+-------------
ficity_id | integer | not null |
fsword | character varying(200) | not null |
Indexes:
"idx_cities_name_ru_words_city_id" btree (ficity_id)
"idx_cities_name_ru_words_word" btree (fsword varchar_pattern_ops)
Foreign-key constraints:
"fk_cities_name_ru_words_city_id" FOREIGN KEY (ficity_id) REFERENCES cities(ficity_id) ON UPDATE CASCADE ON DELETE CASCADE
This is the part of geo location database. The purpose of cities_name_words and cities_name_ru_words is to facilitate indexing on separate words in city name - they contain words of fsname or fsname_ru respectively of the corresponding record in cities if it has more than word. Cities has about 190000 records, cities_name_words about 80000 and cities_name_ru_words about 5000. Now the query for city by name looks like this:
select *
from cities
where ( ficity_id in (
select ficity_id from cities_name_words
where fsword like 'novgorod%'
union
select ficity_id from cities_name_ru_words
where fsword like 'novgorod%'
)
or lower(fsname) like 'novgorod%'
or lower(fsname_ru) like 'novgorod%'
)
from cities
where ( ficity_id in (
select ficity_id from cities_name_words
where fsword like 'novgorod%'
union
select ficity_id from cities_name_ru_words
where fsword like 'novgorod%'
)
or lower(fsname) like 'novgorod%'
or lower(fsname_ru) like 'novgorod%'
)
QUERY PLAN
Seq Scan on cities (cost=16.63..5949.26 rows=95014 width=60)
Filter: ((hashed subplan) OR (lower((fsname)::text) ~~ 'novgorod%'::text) OR (lower((fsname_ru)::text) ~~ 'novgorod%'::text))
SubPlan
-> Unique (cost=16.61..16.62 rows=2 width=4)
-> Sort (cost=16.61..16.62 rows=2 width=4)
Sort Key: cities_name_words.ficity_id
-> Append (cost=0.00..16.60 rows=2 width=4)
-> Index Scan using idx_cities_name_words_word on cities_name_words (cost=0.00..8.31 rows=1 width=4)
Index Cond: (((fsword)::text ~>=~ 'novgorod'::text) AND ((fsword)::text ~<~ 'novgoroe'::text))
Filter: ((fsword)::text ~~ 'novgorod%'::text)
-> Index Scan using idx_cities_name_ru_words_word on cities_name_ru_words (cost=0.00..8.27 rows=1 width=4)
Index Cond: (((fsword)::text ~>=~ 'novgorod'::text) AND ((fsword)::text ~<~ 'novgoroe'::text))
Filter: ((fsword)::text ~~ 'novgorod%'::text)
Seq Scan on cities (cost=16.63..5949.26 rows=95014 width=60)
Filter: ((hashed subplan) OR (lower((fsname)::text) ~~ 'novgorod%'::text) OR (lower((fsname_ru)::text) ~~ 'novgorod%'::text))
SubPlan
-> Unique (cost=16.61..16.62 rows=2 width=4)
-> Sort (cost=16.61..16.62 rows=2 width=4)
Sort Key: cities_name_words.ficity_id
-> Append (cost=0.00..16.60 rows=2 width=4)
-> Index Scan using idx_cities_name_words_word on cities_name_words (cost=0.00..8.31 rows=1 width=4)
Index Cond: (((fsword)::text ~>=~ 'novgorod'::text) AND ((fsword)::text ~<~ 'novgoroe'::text))
Filter: ((fsword)::text ~~ 'novgorod%'::text)
-> Index Scan using idx_cities_name_ru_words_word on cities_name_ru_words (cost=0.00..8.27 rows=1 width=4)
Index Cond: (((fsword)::text ~>=~ 'novgorod'::text) AND ((fsword)::text ~<~ 'novgoroe'::text))
Filter: ((fsword)::text ~~ 'novgorod%'::text)
Notice how it uses proper indexes in subplan and goes for sequence scan on the main table. If the where- conditions are applied separately it uses indexes as expected:
select *
from cities
where ( lower(fsname) like 'novgorod%'
or lower(fsname_ru) like 'novgorod%'
)
from cities
where ( lower(fsname) like 'novgorod%'
or lower(fsname_ru) like 'novgorod%'
)
QUERY PLAN
Bitmap Heap Scan on cities (cost=8.57..12.59 rows=1 width=60)
Recheck Cond: ((lower((fsname)::text) ~~ 'novgorod%'::text) OR (lower((fsname_ru)::text) ~~ 'novgorod%'::text))
Filter: ((lower((fsname)::text) ~~ 'novgorod%'::text) OR (lower((fsname_ru)::text) ~~ 'novgorod%'::text))
-> BitmapOr (cost=8.57..8.57 rows=1 width=0)
-> Bitmap Index Scan on idx_cities_name (cost=0.00..4.29 rows=1 width=0)
Index Cond: ((lower((fsname)::text) ~>=~ 'novgorod'::text) AND (lower((fsname)::text) ~<~ 'novgoroe'::text))
-> Bitmap Index Scan on idx_cities_name_ru (cost=0.00..4.28 rows=1 width=0)
Index Cond: ((lower((fsname_ru)::text) ~>=~ 'novgorod'::text) AND (lower((fsname_ru)::text) ~<~ 'novgoroe'::text))
Bitmap Heap Scan on cities (cost=8.57..12.59 rows=1 width=60)
Recheck Cond: ((lower((fsname)::text) ~~ 'novgorod%'::text) OR (lower((fsname_ru)::text) ~~ 'novgorod%'::text))
Filter: ((lower((fsname)::text) ~~ 'novgorod%'::text) OR (lower((fsname_ru)::text) ~~ 'novgorod%'::text))
-> BitmapOr (cost=8.57..8.57 rows=1 width=0)
-> Bitmap Index Scan on idx_cities_name (cost=0.00..4.29 rows=1 width=0)
Index Cond: ((lower((fsname)::text) ~>=~ 'novgorod'::text) AND (lower((fsname)::text) ~<~ 'novgoroe'::text))
-> Bitmap Index Scan on idx_cities_name_ru (cost=0.00..4.28 rows=1 width=0)
Index Cond: ((lower((fsname_ru)::text) ~>=~ 'novgorod'::text) AND (lower((fsname_ru)::text) ~<~ 'novgoroe'::text))
select *
from cities
where ( ficity_id in (
select ficity_id from cities_name_words
where fsword like 'novgorod%'
union
select ficity_id from cities_name_ru_words
where fsword like 'novgorod%'
)
)
from cities
where ( ficity_id in (
select ficity_id from cities_name_words
where fsword like 'novgorod%'
union
select ficity_id from cities_name_ru_words
where fsword like 'novgorod%'
)
)
QUERY PLAN
Nested Loop (cost=16.61..33.24 rows=2 width=60)
-> Unique (cost=16.61..16.62 rows=2 width=4)
-> Sort (cost=16.61..16.62 rows=2 width=4)
Sort Key: cities_name_words.ficity_id
-> Append (cost=0.00..16.60 rows=2 width=4)
-> Index Scan using idx_cities_name_words_word on cities_name_words (cost=0.00..8.31 rows=1 width=4)
Index Cond: (((fsword)::text ~>=~ 'novgorod'::text) AND ((fsword)::text ~<~ 'novgoroe'::text))
Filter: ((fsword)::text ~~ 'novgorod%'::text)
-> Index Scan using idx_cities_name_ru_words_word on cities_name_ru_words (cost=0.00..8.27 rows=1 width=4)
Index Cond: (((fsword)::text ~>=~ 'novgorod'::text) AND ((fsword)::text ~<~ 'novgoroe'::text))
Filter: ((fsword)::text ~~ 'novgorod%'::text)
-> Index Scan using pk_geocities on cities (cost=0.00..8.28 rows=1 width=60)
Index Cond: (cities.ficity_id = cities_name_words.ficity_id)
Nested Loop (cost=16.61..33.24 rows=2 width=60)
-> Unique (cost=16.61..16.62 rows=2 width=4)
-> Sort (cost=16.61..16.62 rows=2 width=4)
Sort Key: cities_name_words.ficity_id
-> Append (cost=0.00..16.60 rows=2 width=4)
-> Index Scan using idx_cities_name_words_word on cities_name_words (cost=0.00..8.31 rows=1 width=4)
Index Cond: (((fsword)::text ~>=~ 'novgorod'::text) AND ((fsword)::text ~<~ 'novgoroe'::text))
Filter: ((fsword)::text ~~ 'novgorod%'::text)
-> Index Scan using idx_cities_name_ru_words_word on cities_name_ru_words (cost=0.00..8.27 rows=1 width=4)
Index Cond: (((fsword)::text ~>=~ 'novgorod'::text) AND ((fsword)::text ~<~ 'novgoroe'::text))
Filter: ((fsword)::text ~~ 'novgorod%'::text)
-> Index Scan using pk_geocities on cities (cost=0.00..8.28 rows=1 width=60)
Index Cond: (cities.ficity_id = cities_name_words.ficity_id)
So, why does it estimate the row count as 95000 and chooses the seq scan path in the first query (even with enable_seqscan = off)? What can be done to make it use the index?
Sincerely,
Viatcheslav
"Vyacheslav Kalinin" <vka@mgcp.com> writes: > [ poor estimation for ] > select * > from cities > where ( ficity_id in ( > select ficity_id from cities_name_words > where fsword like 'novgorod%' > union > select ficity_id from cities_name_ru_words > where fsword like 'novgorod%' > ) > or lower(fsname) like 'novgorod%' > or lower(fsname_ru) like 'novgorod%' > ) When you have an IN at the top level of WHERE, it's flattened into a kind of JOIN, and there's reasonably decent estimation of the selectivity. Unfortunately, when it's down inside an OR-clause like this, all those smarts go out the window and it's just treated as a generic subplan condition, with 0.5 estimated selectivity IIRC. Improving that would be nice but it's not high on anyone's to-do list. You might get better results if you combine three separate queries with UNION. regards, tom lane