Thread: BUG #16504: Wrapping query in EXISTS() causes sequential scans of tables
BUG #16504: Wrapping query in EXISTS() causes sequential scans of tables
From
PG Bug reporting form
Date:
The following bug has been logged on the website: Bug reference: 16504 Logged by: Drk Shadow Email address: postgresql.20.drkshadow@spamgourmet.com PostgreSQL version: 12.0 Operating system: Gentoo Description: I'm trying to migrate from Postgresql 9.6, but running into performance issues. In this case, I have a query that selects from a table, joins a table, joins that same table, joins the first table, performs a NOT firsttable.value=lasttable.value to make sure that I'm not joining an item with itself. There are indexes for the conditions in each join. When I run this as a select query, the EXPLAIN output uses index scans the whole way through. When I SELECT EXISTS(thatquery) FROM firsttable WHERE firsttable.id IN (1,2,3), the EXPLAIN output switches to using sequential scans. This destroys performance on this 3.8m row table. The real case, with a view and multiple queries joined together, trying to determine if one item (or its preview) has a tag anywhere: https://www.db-fiddle.com/f/efSF4GbPEbAAUQWRnRCkEp/5 The example case, where I compare an EXISTS() to a flat select: https://www.db-fiddle.com/f/f96LRpJQF3VFGZo9EP8nMm/0 ======= The SQL for the example, in case db-fiddle goes away CREATE TABLE boxes ( itemid INTEGER PRIMARY KEY, size INTEGER DEFAULT 0, md5 CHARACTER(22), prevoid oid DEFAULT 0, prevlocal BOOLEAN DEFAULT FALSE ); CREATE INDEX prevoid ON boxes (prevoid) WHERE prevoid > 0; CREATE INDEX boxhash ON boxes(size, SUBSTR(md5, 1, 3)); -- your preview is stored in another location.. CREATE TABLE prevs ( id INTEGER PRIMARY KEY, size INTEGER NOT NULL, md5 CHARACTER(22) NOT NULL -- location CHARACTER... ); CREATE INDEX prevhash ON prevs(size, SUBSTR(md5, 1, 3)); -- ratings CREATE TABLE box_tags ( size INTEGER NOT NULL, md5 CHARACTER(22) NOT NULL, value INTEGER NOT NULL ); CREATE INDEX taghashes ON box_tags(size, SUBSTR(md5, 1, 3)); CREATE VIEW other_location AS SELECT bx_a.itemid, bx_a.prevlocal AS local, bx_other_a.itemid AS bitemid, bt_a.value from boxes bx_a JOIN prevs pv_a ON pv_a.id=(bx_a.prevoid)::INTEGER AND bx_a.prevlocal JOIN prevs pv1_a ON (pv1_a.size,SUBSTR(pv1_a.md5, 1, 3),pv1_a.md5)=(pv_a.size, SUBSTR(pv_a.md5, 1, 3),pv_a.md5) JOIN boxes bx_other_a ON bx_other_a.prevoid=(pv1_a.id)::oid AND bx_other_a.prevlocal LEFT JOIN box_tags bt_a ON (bt_a.size,SUBSTR(bt_a.md5, 1, 3), bt_a.md5)=(pv1_a.size,SUBSTR(pv1_a.md5, 1, 3), pv1_a.md5) WHERE bx_other_a.prevoid > 0 AND NOT bx_a.itemid = bx_other_a.itemid ; -- ---- -- The examples EXPLAIN SELECT EXISTS(select 1 FROM other_location WHERE itemid=boxes.itemid) FROM boxes WHERE boxes.itemid IN (1,2,3); ---- Not copy-pasting the EXPLAIN output, the first query has two subplans. The first subplan uses indexes, and the second subplan uses -> Seq Scan on boxes bx_a_1 (cost=0.00..22.10 rows=605 width=8). The second query has no second subplan, and there is no sequential table scan. It appears there is a bug: EXISTS() subqueries are not handled as part of the query, or perhaps they're done in parallel (at extreme cost), or perhaps something else. In this case, there's no indication of a parallel branch, and `SET max_parallel_workers_per_gather = 0;` does not change the plan. I've noticed this problem as well with JOIN (SELECT...) subqueries (but I didn't make an example), so I believe this isn't about EXISTS(), but rather about _all_ subqueries now being a performance barrier. As such, I can't migrate anything with subqueries to Postgres 12. Is this a bug? EXPLAIN SELECT other_location.bitemid FROM boxes JOIN other_location ON other_location.itemid=boxes.itemid WHERE boxes.itemid IN (1,2,3);
PG Bug reporting form <noreply@postgresql.org> writes: > In this case, I have a query that selects from a table, joins a table, joins > that same table, joins the first table, performs a NOT > firsttable.value=lasttable.value to make sure that I'm not joining an item > with itself. There are indexes for the conditions in each join. When I run > this as a select query, the EXPLAIN output uses index scans the whole way > through. When I SELECT EXISTS(thatquery) FROM firsttable WHERE firsttable.id > IN (1,2,3), the EXPLAIN output switches to using sequential scans. This > destroys performance on this 3.8m row table. Are you claiming that 9.6 did better? I'm not aware that we changed anything significant about how a sub-select in a query's select list works. The table/view definitions alone are not enough to investigate this, since it's unlikely that we'd get the same plans on an empty table as a populated one. Please see if you can make a self-contained test case with some dummy data that reproduces the problem (ie better plan on 9.6 than later). regards, tom lane