Thread: Slow queries salad ;)
Here is a simple test case for this strange behaviour : annonces=> CREATE TABLE test.current (id INTEGER PRIMARY KEY, description TEXT); INFO: CREATE TABLE / PRIMARY KEY creera un index implicite <<current_pkey>> pour la table <<current>> CREATE TABLE annonces=> CREATE TABLE test.archive (id INTEGER PRIMARY KEY, description TEXT); INFO: CREATE TABLE / PRIMARY KEY creera un index implicite <<archive_pkey>> pour la table <<archive>> CREATE TABLE annonces=> CREATE VIEW test.all AS SELECT * FROM test.archive UNION ALL SELECT * FROM test.current ; CREATE VIEW let's populate... annonces=> INSERT INTO test.current SELECT id, description FROM annonces; INSERT 0 11524 annonces=> INSERT INTO test.archive SELECT id, description FROM archive_ext; INSERT 0 437351 annonces=> ANALYZE test.archive; ANALYZE annonces=> ANALYZE test.current; ANALYZE This is the bookmarks table... SELECT count(*), list_id FROM bookmarks GROUP BY list_id; count | list_id -------+--------- 15 | 7 5 | 5 150 | 4 3 | 3 15 | 2 2 | 1 6 | 8 I want to list the stuff I bookmarked : annonces=> EXPLAIN ANALYZE SELECT * FROM test.current WHERE id IN (SELECT annonce_id FROM bookmarks WHERE list_id IN ('4')); QUERY PLAN ------------------------------------------------------------------------------------------------------------------------- Hash Join (cost=6.58..532.84 rows=140 width=203) (actual time=0.747..5.283 rows=150 loops=1) Hash Cond: ("outer".id = "inner".annonce_id) -> Seq Scan on current (cost=0.00..467.24 rows=11524 width=203) (actual time=0.006..3.191 rows=11524 loops=1) -> Hash (cost=6.23..6.23 rows=140 width=4) (actual time=0.244..0.244 rows=150 loops=1) -> HashAggregate (cost=4.83..6.23 rows=140 width=4) (actual time=0.155..0.184 rows=150 loops=1) -> Seq Scan on bookmarks (cost=0.00..4.45 rows=150 width=4) (actual time=0.008..0.097 rows=150 loops=1) Filter: (list_id = 4) Total runtime: 5.343 ms (8 lignes) annonces=> set enable_hashjoin TO 0; SET annonces=> EXPLAIN ANALYZE SELECT * FROM test.current WHERE id IN (SELECT annonce_id FROM bookmarks WHERE list_id IN ('4')); QUERY PLAN -------------------------------------------------------------------------------------------------------------------------------- Nested Loop (cost=4.83..824.22 rows=140 width=203) (actual time=0.219..1.034 rows=150 loops=1) -> HashAggregate (cost=4.83..6.23 rows=140 width=4) (actual time=0.158..0.199 rows=150 loops=1) -> Seq Scan on bookmarks (cost=0.00..4.45 rows=150 width=4) (actual time=0.011..0.096 rows=150 loops=1) Filter: (list_id = 4) -> Index Scan using current_pkey on current (cost=0.00..5.83 rows=1 width=203) (actual time=0.005..0.005 rows=1 loops=150) Index Cond: (current.id = "outer".annonce_id) Total runtime: 1.108 ms (7 lignes) Hm, the row estimates on the "bookmarks" table are spot on ; why did it choose the hash join ? Now, if I want to access the "all" view which contains the union of the "current" and "archive" table : annonces=> set enable_hashjoin TO 1; SET annonces=> EXPLAIN ANALYZE SELECT * FROM test.all WHERE id IN (SELECT annonce_id FROM bookmarks WHERE list_id IN ('4')); QUERY PLAN -------------------------------------------------------------------------------------------------------------------------------- Hash Join (cost=6.58..33484.41 rows=314397 width=36) (actual time=8300.484..8311.784 rows=150 loops=1) Hash Cond: ("outer"."?column1?" = "inner".annonce_id) -> Append (cost=0.00..23596.78 rows=449139 width=219) (actual time=6.390..8230.821 rows=448875 loops=1) -> Seq Scan on archive (cost=0.00..18638.15 rows=437615 width=219) (actual time=6.389..8175.491 rows=437351 loops=1) -> Seq Scan on current (cost=0.00..467.24 rows=11524 width=203) (actual time=0.022..8.985 rows=11524 loops=1) -> Hash (cost=6.23..6.23 rows=140 width=4) (actual time=0.255..0.255 rows=150 loops=1) -> HashAggregate (cost=4.83..6.23 rows=140 width=4) (actual time=0.168..0.197 rows=150 loops=1) -> Seq Scan on bookmarks (cost=0.00..4.45 rows=150 width=4) (actual time=0.015..0.102 rows=150 loops=1) Filter: (list_id = 4) Total runtime: 8311.870 ms (10 lignes) annonces=> set enable_hashjoin TO 0; SET annonces=> EXPLAIN ANALYZE SELECT * FROM test.all WHERE id IN (SELECT annonce_id FROM bookmarks WHERE list_id IN ('4')); QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------------- Merge Join (cost=79604.61..84994.98 rows=314397 width=36) (actual time=6944.229..7109.371 rows=150 loops=1) Merge Cond: ("outer".annonce_id = "inner".id) -> Sort (cost=11.22..11.57 rows=140 width=4) (actual time=0.326..0.355 rows=150 loops=1) Sort Key: bookmarks.annonce_id -> HashAggregate (cost=4.83..6.23 rows=140 width=4) (actual time=0.187..0.218 rows=150 loops=1) -> Seq Scan on bookmarks (cost=0.00..4.45 rows=150 width=4) (actual time=0.028..0.126 rows=150 loops=1) Filter: (list_id = 4) -> Sort (cost=79593.40..80716.25 rows=449139 width=36) (actual time=6789.786..7014.815 rows=448625 loops=1) Sort Key: "all".id -> Append (cost=0.00..23596.78 rows=449139 width=219) (actual time=0.013..391.447 rows=448875 loops=1) -> Seq Scan on archive (cost=0.00..18638.15 rows=437615 width=219) (actual time=0.013..332.353 rows=437351 loops=1) -> Seq Scan on current (cost=0.00..467.24 rows=11524 width=203) (actual time=0.013..8.396 rows=11524 loops=1) Total runtime: 37226.846 ms The IN() is quite small (150 values), but the two large tables are seq-scanned... is there a way to avoid this ? ------------------------------------------------------------------------------------------------------------------------------------- Another nitpick : let's redo the first query differently. annonces=> EXPLAIN ANALYZE SELECT * FROM test.current WHERE id IN (SELECT annonce_id FROM bookmarks WHERE list_id IN ('4')); QUERY PLAN ------------------------------------------------------------------------------------------------------------------------- Hash Join (cost=6.58..532.84 rows=140 width=203) (actual time=0.794..5.791 rows=150 loops=1) Hash Cond: ("outer".id = "inner".annonce_id) -> Seq Scan on current (cost=0.00..467.24 rows=11524 width=203) (actual time=0.003..3.554 rows=11524 loops=1) -> Hash (cost=6.23..6.23 rows=140 width=4) (actual time=0.265..0.265 rows=150 loops=1) -> HashAggregate (cost=4.83..6.23 rows=140 width=4) (actual time=0.179..0.210 rows=150 loops=1) -> Seq Scan on bookmarks (cost=0.00..4.45 rows=150 width=4) (actual time=0.021..0.102 rows=150 loops=1) Filter: (list_id = 4) Total runtime: 5.853 ms annonces=> EXPLAIN ANALYZE SELECT a.* FROM test.current a, (SELECT DISTINCT annonce_id FROM bookmarks WHERE list_id IN ('4')) AS b WHERE a.id=b.annonce_id; QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------- Hash Join (cost=12.37..538.63 rows=140 width=203) (actual time=0.812..5.362 rows=150 loops=1) Hash Cond: ("outer".id = "inner".annonce_id) -> Seq Scan on current a (cost=0.00..467.24 rows=11524 width=203) (actual time=0.005..3.227 rows=11524 loops=1) -> Hash (cost=12.02..12.02 rows=140 width=4) (actual time=0.296..0.296 rows=150 loops=1) -> Unique (cost=9.87..10.62 rows=140 width=4) (actual time=0.215..0.265 rows=150 loops=1) -> Sort (cost=9.87..10.25 rows=150 width=4) (actual time=0.215..0.226 rows=150 loops=1) Sort Key: bookmarks.annonce_id -> Seq Scan on bookmarks (cost=0.00..4.45 rows=150 width=4) (actual time=0.007..0.104 rows=150 loops=1) Filter: (list_id = 4) Total runtime: 5.429 ms Hm, it does Sort + Unique + Hash ; the Hash alone would have been better. Replacing DISTINCT with GROUP BY removes the sort. annonces=> EXPLAIN ANALYZE SELECT a.* FROM test.current a, (SELECT annonce_id FROM bookmarks WHERE list_id IN ('4') GROUP BY annonce_id) AS b WHERE a.id=b.annonce_id; QUERY PLAN ------------------------------------------------------------------------------------------------------------------------- Hash Join (cost=7.98..534.24 rows=140 width=203) (actual time=0.811..5.557 rows=150 loops=1) Hash Cond: ("outer".id = "inner".annonce_id) -> Seq Scan on current a (cost=0.00..467.24 rows=11524 width=203) (actual time=0.006..3.434 rows=11524 loops=1) -> Hash (cost=7.63..7.63 rows=140 width=4) (actual time=0.242..0.242 rows=150 loops=1) -> HashAggregate (cost=4.83..6.23 rows=140 width=4) (actual time=0.156..0.186 rows=150 loops=1) -> Seq Scan on bookmarks (cost=0.00..4.45 rows=150 width=4) (actual time=0.008..0.097 rows=150 loops=1) Filter: (list_id = 4) Total runtime: 5.647 ms
PFC <lists@peufeu.com> writes: > The IN() is quite small (150 values), but the two large tables are > seq-scanned... is there a way to avoid this ? Not in 8.1. HEAD is a bit smarter about joins to Append relations. regards, tom lane
On Tue, Apr 25, 2006 at 07:53:15PM +0200, PFC wrote: What version is this?? > annonces=> EXPLAIN ANALYZE SELECT * FROM test.current WHERE id IN (SELECT > annonce_id FROM bookmarks WHERE list_id IN ('4')); > QUERY PLAN > ------------------------------------------------------------------------------------------------------------------------- > Hash Join (cost=6.58..532.84 rows=140 width=203) (actual > time=0.747..5.283 rows=150 loops=1) > Hash Cond: ("outer".id = "inner".annonce_id) > -> Seq Scan on current (cost=0.00..467.24 rows=11524 width=203) > (actual time=0.006..3.191 rows=11524 loops=1) > -> Hash (cost=6.23..6.23 rows=140 width=4) (actual time=0.244..0.244 > rows=150 loops=1) > -> HashAggregate (cost=4.83..6.23 rows=140 width=4) (actual > time=0.155..0.184 rows=150 loops=1) > -> Seq Scan on bookmarks (cost=0.00..4.45 rows=150 > width=4) (actual time=0.008..0.097 rows=150 loops=1) > Filter: (list_id = 4) > Total runtime: 5.343 ms > (8 lignes) > > annonces=> set enable_hashjoin TO 0; > SET > annonces=> EXPLAIN ANALYZE SELECT * FROM test.current WHERE id IN (SELECT > annonce_id FROM bookmarks WHERE list_id IN ('4')); > QUERY PLAN > -------------------------------------------------------------------------------------------------------------------------------- > Nested Loop (cost=4.83..824.22 rows=140 width=203) (actual > time=0.219..1.034 rows=150 loops=1) > -> HashAggregate (cost=4.83..6.23 rows=140 width=4) (actual > time=0.158..0.199 rows=150 loops=1) > -> Seq Scan on bookmarks (cost=0.00..4.45 rows=150 width=4) > (actual time=0.011..0.096 rows=150 loops=1) > Filter: (list_id = 4) > -> Index Scan using current_pkey on current (cost=0.00..5.83 rows=1 > width=203) (actual time=0.005..0.005 rows=1 loops=150) > Index Cond: (current.id = "outer".annonce_id) > Total runtime: 1.108 ms > (7 lignes) > > Hm, the row estimates on the "bookmarks" table are spot on ; why did > it choose the hash join ? Because it thought it would be cheaper; see the estimates. Increasing effective_cache_size or decreasing random_page_cost would favor the index scan. > Now, if I want to access the "all" view which contains the union of > the "current" and "archive" table : > > annonces=> set enable_hashjoin TO 1; > SET > annonces=> EXPLAIN ANALYZE SELECT * FROM test.all WHERE id IN (SELECT > annonce_id FROM bookmarks WHERE list_id IN ('4')); > QUERY PLAN > -------------------------------------------------------------------------------------------------------------------------------- > Hash Join (cost=6.58..33484.41 rows=314397 width=36) (actual > time=8300.484..8311.784 rows=150 loops=1) > Hash Cond: ("outer"."?column1?" = "inner".annonce_id) > -> Append (cost=0.00..23596.78 rows=449139 width=219) (actual > time=6.390..8230.821 rows=448875 loops=1) > -> Seq Scan on archive (cost=0.00..18638.15 rows=437615 > width=219) (actual time=6.389..8175.491 rows=437351 loops=1) > -> Seq Scan on current (cost=0.00..467.24 rows=11524 width=203) > (actual time=0.022..8.985 rows=11524 loops=1) > -> Hash (cost=6.23..6.23 rows=140 width=4) (actual time=0.255..0.255 > rows=150 loops=1) > -> HashAggregate (cost=4.83..6.23 rows=140 width=4) (actual > time=0.168..0.197 rows=150 loops=1) > -> Seq Scan on bookmarks (cost=0.00..4.45 rows=150 > width=4) (actual time=0.015..0.102 rows=150 loops=1) > Filter: (list_id = 4) > Total runtime: 8311.870 ms > (10 lignes) > > annonces=> set enable_hashjoin TO 0; > SET > annonces=> EXPLAIN ANALYZE SELECT * FROM test.all WHERE id IN (SELECT > annonce_id FROM bookmarks WHERE list_id IN ('4')); > QUERY PLAN > ------------------------------------------------------------------------------------------------------------------------------------- > Merge Join (cost=79604.61..84994.98 rows=314397 width=36) (actual > time=6944.229..7109.371 rows=150 loops=1) > Merge Cond: ("outer".annonce_id = "inner".id) > -> Sort (cost=11.22..11.57 rows=140 width=4) (actual > time=0.326..0.355 rows=150 loops=1) > Sort Key: bookmarks.annonce_id > -> HashAggregate (cost=4.83..6.23 rows=140 width=4) (actual > time=0.187..0.218 rows=150 loops=1) > -> Seq Scan on bookmarks (cost=0.00..4.45 rows=150 > width=4) (actual time=0.028..0.126 rows=150 loops=1) > Filter: (list_id = 4) > -> Sort (cost=79593.40..80716.25 rows=449139 width=36) (actual > time=6789.786..7014.815 rows=448625 loops=1) > Sort Key: "all".id > -> Append (cost=0.00..23596.78 rows=449139 width=219) (actual > time=0.013..391.447 rows=448875 loops=1) > -> Seq Scan on archive (cost=0.00..18638.15 rows=437615 > width=219) (actual time=0.013..332.353 rows=437351 loops=1) > -> Seq Scan on current (cost=0.00..467.24 rows=11524 > width=203) (actual time=0.013..8.396 rows=11524 loops=1) > Total runtime: 37226.846 ms > > The IN() is quite small (150 values), but the two large tables are > seq-scanned... is there a way to avoid this ? Possibly if you break the view apart... SELECT ... FROM current WHERE id IN (...) UNION ALL SELECT ... FROM archive WHERE id IN (...) > ------------------------------------------------------------------------------------------------------------------------------------- > > Another nitpick : let's redo the first query differently. > > annonces=> EXPLAIN ANALYZE SELECT * FROM test.current WHERE id IN (SELECT > annonce_id FROM bookmarks WHERE list_id IN ('4')); > QUERY PLAN > ------------------------------------------------------------------------------------------------------------------------- > Hash Join (cost=6.58..532.84 rows=140 width=203) (actual > time=0.794..5.791 rows=150 loops=1) > Hash Cond: ("outer".id = "inner".annonce_id) > -> Seq Scan on current (cost=0.00..467.24 rows=11524 width=203) > (actual time=0.003..3.554 rows=11524 loops=1) > -> Hash (cost=6.23..6.23 rows=140 width=4) (actual time=0.265..0.265 > rows=150 loops=1) > -> HashAggregate (cost=4.83..6.23 rows=140 width=4) (actual > time=0.179..0.210 rows=150 loops=1) > -> Seq Scan on bookmarks (cost=0.00..4.45 rows=150 > width=4) (actual time=0.021..0.102 rows=150 loops=1) > Filter: (list_id = 4) > Total runtime: 5.853 ms > > annonces=> EXPLAIN ANALYZE SELECT a.* FROM test.current a, (SELECT > DISTINCT annonce_id FROM bookmarks WHERE list_id IN ('4')) AS b WHERE > a.id=b.annonce_id; > QUERY PLAN > ------------------------------------------------------------------------------------------------------------------------------- > Hash Join (cost=12.37..538.63 rows=140 width=203) (actual > time=0.812..5.362 rows=150 loops=1) > Hash Cond: ("outer".id = "inner".annonce_id) > -> Seq Scan on current a (cost=0.00..467.24 rows=11524 width=203) > (actual time=0.005..3.227 rows=11524 loops=1) > -> Hash (cost=12.02..12.02 rows=140 width=4) (actual > time=0.296..0.296 rows=150 loops=1) > -> Unique (cost=9.87..10.62 rows=140 width=4) (actual > time=0.215..0.265 rows=150 loops=1) > -> Sort (cost=9.87..10.25 rows=150 width=4) (actual > time=0.215..0.226 rows=150 loops=1) > Sort Key: bookmarks.annonce_id > -> Seq Scan on bookmarks (cost=0.00..4.45 rows=150 > width=4) (actual time=0.007..0.104 rows=150 loops=1) > Filter: (list_id = 4) > Total runtime: 5.429 ms > > Hm, it does Sort + Unique + Hash ; the Hash alone would have been > better. The hash alone wouldn't have gotten you the same output as a DISTINCT, though. > Replacing DISTINCT with GROUP BY removes the sort. > > annonces=> EXPLAIN ANALYZE SELECT a.* FROM test.current a, (SELECT > annonce_id FROM bookmarks WHERE list_id IN ('4') GROUP BY annonce_id) AS b > WHERE a.id=b.annonce_id; > QUERY PLAN > ------------------------------------------------------------------------------------------------------------------------- > Hash Join (cost=7.98..534.24 rows=140 width=203) (actual > time=0.811..5.557 rows=150 loops=1) > Hash Cond: ("outer".id = "inner".annonce_id) > -> Seq Scan on current a (cost=0.00..467.24 rows=11524 width=203) > (actual time=0.006..3.434 rows=11524 loops=1) > -> Hash (cost=7.63..7.63 rows=140 width=4) (actual time=0.242..0.242 > rows=150 loops=1) > -> HashAggregate (cost=4.83..6.23 rows=140 width=4) (actual > time=0.156..0.186 rows=150 loops=1) > -> Seq Scan on bookmarks (cost=0.00..4.45 rows=150 > width=4) (actual time=0.008..0.097 rows=150 loops=1) > Filter: (list_id = 4) > Total runtime: 5.647 ms Uhm.. that actually runs slower... -- Jim C. Nasby, Sr. Engineering Consultant jnasby@pervasive.com Pervasive Software http://pervasive.com work: 512-231-6117 vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461