Thread: BUG #1435: Optimizer not using index on large tables when inner joining two views
BUG #1435: Optimizer not using index on large tables when inner joining two views
From
"Yary Hluchan"
Date:
The following bug has been logged online: Bug reference: 1435 Logged by: Yary Hluchan Email address: not.com@gmail.com PostgreSQL version: 8.0.0 Operating system: OpenBSD 3.6 Description: Optimizer not using index on large tables when inner joining two views Details: There's a bit of background here, please bear with me- The PostgreSQL optimizer is failing with some queries on the MusicBrainz database ( http://www.musicbrainz.org/products/server/download.html - though it will be easier to get a data dump directly from me). The base tables involved are: moderation_open : 6047 rows moderation_closed : 1502267 rows vote_open : 13811 rows vote_closed : 2116153 rows The only difference between the "open" and "closed" tables are the data stored within them- other than the titles, their definitions and indexes are the same. The vote tables have a column "moderation" which join with the moderation tables' "id" column. The vote tables also have a column "moderator" which represents the voter. All tables have other columns which aren't relevant for this bug report. CREATE TABLE moderation_<open,closed> ( id INTEGER NOT NULL, ... ) CREATE TABLE vote_<open,closed> ( id INTEGER NOT NULL, moderator INTEGER NOT NULL, -- references moderator moderation INTEGER NOT NULL, -- references moderation ... ) There are views, "moderation_all" and "vote_all", that combine the data in the open & closed tables CREATE VIEW moderation_all AS SELECT * FROM moderation_open UNION ALL SELECT * FROM moderation_closed; CREATE VIEW vote_all AS SELECT * FROM vote_open UNION ALL SELECT * FROM vote_closed; The bug appears with this query: SELECT m.*, NOW() > m.expiretime AS expired, COALESCE(v.vote, -2) AS vote FROM moderation_all m INNER JOIN vote_all v ON v.moderation = m.id AND v.moderator = 85981 AND NOT v.superseded ORDER BY m.id DESC I've put it and the query plan at http://yary.ack.org/qp.txt (all query plans shown are from a newly vacuumed and analyzed pg 8.0.0 database). Query plan 1 shows sequential scans on the moderation_* tables If I SET ENABLE_SEQSCAN TO OFF the new plan still does sequential scans on the moderation tables, and still takes about ten minutes wall time on my system. See Query Plan 2. I then set ENABLE_SEQSCAN TO ON and run a test to see if there is a problem with the indexes on the moderation tables- explain SELECT m.* FROM moderation_all m where m.id = 2045296; it indeed does use the indexes, and the query returns quickly. (See "test qp 3") I then try some rewrites of the query that don't use the moderation_all view. The rewrites return identical data as the original and run very quickly, and don't have optimizer plans using sequential scans on the base tables. I've tried to make them identical to the original query. rewrite 1- SELECT m.*, NOW() > m.expiretime AS expired, COALESCE(v.vote, -2) AS vote FROM moderation_open m INNER JOIN vote_all v ON v.moderation = m.id AND v.moderator = 85981 AND NOT v.superseded UNION ALL SELECT m.*, NOW() > m.expiretime AS expired, COALESCE(v.vote, -2) AS vote FROM moderation_closed m INNER JOIN vote_all v ON v.moderation = m.id AND v.moderator = 85981 AND NOT v.superseded ORDER BY id DESC My reading of it is that the optimizer isn't applying a distributive rule: condition(a union b) == condition(a) union condition(b). It knows the subquery is going to be compared with m.id, but it doesn't seem to try moving that comparison into the component subqueries. Or maybe there's some re-arrangement of the query tree it's just plain missing. I'll limit my speculation. I can provide any data dumps or run any tests you request. Thanks for looking -y
Re: BUG #1435: Optimizer not using index on large tables when inner joining two views
From
Tom Lane
Date:
"Yary Hluchan" <not.com@gmail.com> writes: > My reading of it is that the optimizer isn't applying a distributive rule: > condition(a union b) == condition(a) union condition(b). No, the correct reading is that a UNION subquery is planned independently of the surrounding query. This is not likely to change soon, as it would involve some rather wholesale changes. My advice is to lose the open/closed distinction and fold those tables into single tables with an extra boolean flag column. I suppose the point of this database layout is to provide fast access to the "open" subsets, but you could probably achieve that by making partial indexes, eg create index vote_open_ids on vote (id) where is_open; regards, tom lane