Thread: Slow queries salad ;)

Slow queries salad ;)

From
PFC
Date:
    Here is a simple test case for this strange behaviour :

annonces=> CREATE TABLE test.current (id INTEGER PRIMARY KEY, description
TEXT);
INFO:  CREATE TABLE / PRIMARY KEY creera un index implicite
<<current_pkey>> pour la table <<current>>
CREATE TABLE

annonces=> CREATE TABLE test.archive (id INTEGER PRIMARY KEY, description
TEXT);
INFO:  CREATE TABLE / PRIMARY KEY creera un index implicite
<<archive_pkey>> pour la table <<archive>>
CREATE TABLE

annonces=> CREATE VIEW test.all AS SELECT * FROM test.archive UNION ALL
SELECT * FROM test.current ;
CREATE VIEW

    let's populate...

annonces=> INSERT INTO test.current SELECT id, description FROM annonces;
INSERT 0 11524
annonces=> INSERT INTO test.archive SELECT id, description FROM
archive_ext;
INSERT 0 437351

annonces=> ANALYZE test.archive;
ANALYZE
annonces=> ANALYZE test.current;
ANALYZE

    This is the bookmarks table...

SELECT count(*), list_id FROM bookmarks GROUP BY list_id;
  count | list_id
-------+---------
     15 |       7
      5 |       5
    150 |       4
      3 |       3
     15 |       2
      2 |       1
      6 |       8

    I want to list the stuff I bookmarked :


annonces=> EXPLAIN ANALYZE SELECT * FROM test.current WHERE id IN (SELECT
annonce_id FROM bookmarks WHERE list_id IN ('4'));
                                                        QUERY PLAN

-------------------------------------------------------------------------------------------------------------------------
  Hash Join  (cost=6.58..532.84 rows=140 width=203) (actual
time=0.747..5.283 rows=150 loops=1)
    Hash Cond: ("outer".id = "inner".annonce_id)
    ->  Seq Scan on current  (cost=0.00..467.24 rows=11524 width=203)
(actual time=0.006..3.191 rows=11524 loops=1)
    ->  Hash  (cost=6.23..6.23 rows=140 width=4) (actual time=0.244..0.244
rows=150 loops=1)
          ->  HashAggregate  (cost=4.83..6.23 rows=140 width=4) (actual
time=0.155..0.184 rows=150 loops=1)
                ->  Seq Scan on bookmarks  (cost=0.00..4.45 rows=150
width=4) (actual time=0.008..0.097 rows=150 loops=1)
                      Filter: (list_id = 4)
  Total runtime: 5.343 ms
(8 lignes)

annonces=> set enable_hashjoin TO 0;
SET
annonces=> EXPLAIN ANALYZE SELECT * FROM test.current WHERE id IN (SELECT
annonce_id FROM bookmarks WHERE list_id IN ('4'));
                                                            QUERY PLAN

--------------------------------------------------------------------------------------------------------------------------------
  Nested Loop  (cost=4.83..824.22 rows=140 width=203) (actual
time=0.219..1.034 rows=150 loops=1)
    ->  HashAggregate  (cost=4.83..6.23 rows=140 width=4) (actual
time=0.158..0.199 rows=150 loops=1)
          ->  Seq Scan on bookmarks  (cost=0.00..4.45 rows=150 width=4)
(actual time=0.011..0.096 rows=150 loops=1)
                Filter: (list_id = 4)
    ->  Index Scan using current_pkey on current  (cost=0.00..5.83 rows=1
width=203) (actual time=0.005..0.005 rows=1 loops=150)
          Index Cond: (current.id = "outer".annonce_id)
  Total runtime: 1.108 ms
(7 lignes)

    Hm, the row estimates on the "bookmarks" table are spot on ; why did it
choose the hash join ?

    Now, if I want to access the "all" view which contains the union of the
"current" and "archive" table :

annonces=> set enable_hashjoin TO 1;
SET
annonces=> EXPLAIN ANALYZE SELECT * FROM test.all WHERE id IN (SELECT
annonce_id FROM bookmarks WHERE list_id IN ('4'));
                                                            QUERY PLAN

--------------------------------------------------------------------------------------------------------------------------------
  Hash Join  (cost=6.58..33484.41 rows=314397 width=36) (actual
time=8300.484..8311.784 rows=150 loops=1)
    Hash Cond: ("outer"."?column1?" = "inner".annonce_id)
    ->  Append  (cost=0.00..23596.78 rows=449139 width=219) (actual
time=6.390..8230.821 rows=448875 loops=1)
          ->  Seq Scan on archive  (cost=0.00..18638.15 rows=437615
width=219) (actual time=6.389..8175.491 rows=437351 loops=1)
          ->  Seq Scan on current  (cost=0.00..467.24 rows=11524 width=203)
(actual time=0.022..8.985 rows=11524 loops=1)
    ->  Hash  (cost=6.23..6.23 rows=140 width=4) (actual time=0.255..0.255
rows=150 loops=1)
          ->  HashAggregate  (cost=4.83..6.23 rows=140 width=4) (actual
time=0.168..0.197 rows=150 loops=1)
                ->  Seq Scan on bookmarks  (cost=0.00..4.45 rows=150
width=4) (actual time=0.015..0.102 rows=150 loops=1)
                      Filter: (list_id = 4)
  Total runtime: 8311.870 ms
(10 lignes)

annonces=> set enable_hashjoin TO 0;
SET
annonces=> EXPLAIN ANALYZE SELECT * FROM test.all WHERE id IN (SELECT
annonce_id FROM bookmarks WHERE list_id IN ('4'));
                                                              QUERY PLAN

-------------------------------------------------------------------------------------------------------------------------------------
  Merge Join  (cost=79604.61..84994.98 rows=314397 width=36) (actual
time=6944.229..7109.371 rows=150 loops=1)
    Merge Cond: ("outer".annonce_id = "inner".id)
    ->  Sort  (cost=11.22..11.57 rows=140 width=4) (actual
time=0.326..0.355 rows=150 loops=1)
          Sort Key: bookmarks.annonce_id
          ->  HashAggregate  (cost=4.83..6.23 rows=140 width=4) (actual
time=0.187..0.218 rows=150 loops=1)
                ->  Seq Scan on bookmarks  (cost=0.00..4.45 rows=150
width=4) (actual time=0.028..0.126 rows=150 loops=1)
                      Filter: (list_id = 4)
    ->  Sort  (cost=79593.40..80716.25 rows=449139 width=36) (actual
time=6789.786..7014.815 rows=448625 loops=1)
          Sort Key: "all".id
          ->  Append  (cost=0.00..23596.78 rows=449139 width=219) (actual
time=0.013..391.447 rows=448875 loops=1)
                ->  Seq Scan on archive  (cost=0.00..18638.15 rows=437615
width=219) (actual time=0.013..332.353 rows=437351 loops=1)
                ->  Seq Scan on current  (cost=0.00..467.24 rows=11524
width=203) (actual time=0.013..8.396 rows=11524 loops=1)
  Total runtime: 37226.846 ms

    The IN() is quite small (150 values), but the two large tables are
seq-scanned... is there a way to avoid this ?



-------------------------------------------------------------------------------------------------------------------------------------

    Another nitpick : let's redo the first query differently.

annonces=> EXPLAIN ANALYZE SELECT * FROM test.current WHERE id IN (SELECT
annonce_id FROM bookmarks WHERE list_id IN ('4'));
                                                        QUERY PLAN

-------------------------------------------------------------------------------------------------------------------------
  Hash Join  (cost=6.58..532.84 rows=140 width=203) (actual
time=0.794..5.791 rows=150 loops=1)
    Hash Cond: ("outer".id = "inner".annonce_id)
    ->  Seq Scan on current  (cost=0.00..467.24 rows=11524 width=203)
(actual time=0.003..3.554 rows=11524 loops=1)
    ->  Hash  (cost=6.23..6.23 rows=140 width=4) (actual time=0.265..0.265
rows=150 loops=1)
          ->  HashAggregate  (cost=4.83..6.23 rows=140 width=4) (actual
time=0.179..0.210 rows=150 loops=1)
                ->  Seq Scan on bookmarks  (cost=0.00..4.45 rows=150
width=4) (actual time=0.021..0.102 rows=150 loops=1)
                      Filter: (list_id = 4)
  Total runtime: 5.853 ms

annonces=> EXPLAIN ANALYZE SELECT a.* FROM test.current a, (SELECT
DISTINCT annonce_id FROM bookmarks WHERE list_id IN ('4')) AS b WHERE
a.id=b.annonce_id;
                                                           QUERY PLAN

-------------------------------------------------------------------------------------------------------------------------------
  Hash Join  (cost=12.37..538.63 rows=140 width=203) (actual
time=0.812..5.362 rows=150 loops=1)
    Hash Cond: ("outer".id = "inner".annonce_id)
    ->  Seq Scan on current a  (cost=0.00..467.24 rows=11524 width=203)
(actual time=0.005..3.227 rows=11524 loops=1)
    ->  Hash  (cost=12.02..12.02 rows=140 width=4) (actual
time=0.296..0.296 rows=150 loops=1)
          ->  Unique  (cost=9.87..10.62 rows=140 width=4) (actual
time=0.215..0.265 rows=150 loops=1)
                ->  Sort  (cost=9.87..10.25 rows=150 width=4) (actual
time=0.215..0.226 rows=150 loops=1)
                      Sort Key: bookmarks.annonce_id
                      ->  Seq Scan on bookmarks  (cost=0.00..4.45 rows=150
width=4) (actual time=0.007..0.104 rows=150 loops=1)
                            Filter: (list_id = 4)
  Total runtime: 5.429 ms

    Hm, it does Sort + Unique + Hash ; the Hash alone would have been better.
    Replacing DISTINCT with GROUP BY removes the sort.

annonces=> EXPLAIN ANALYZE SELECT a.* FROM test.current a, (SELECT
annonce_id FROM bookmarks WHERE list_id IN ('4') GROUP BY annonce_id) AS b
WHERE a.id=b.annonce_id;
                                                        QUERY PLAN

-------------------------------------------------------------------------------------------------------------------------
  Hash Join  (cost=7.98..534.24 rows=140 width=203) (actual
time=0.811..5.557 rows=150 loops=1)
    Hash Cond: ("outer".id = "inner".annonce_id)
    ->  Seq Scan on current a  (cost=0.00..467.24 rows=11524 width=203)
(actual time=0.006..3.434 rows=11524 loops=1)
    ->  Hash  (cost=7.63..7.63 rows=140 width=4) (actual time=0.242..0.242
rows=150 loops=1)
          ->  HashAggregate  (cost=4.83..6.23 rows=140 width=4) (actual
time=0.156..0.186 rows=150 loops=1)
                ->  Seq Scan on bookmarks  (cost=0.00..4.45 rows=150
width=4) (actual time=0.008..0.097 rows=150 loops=1)
                      Filter: (list_id = 4)
  Total runtime: 5.647 ms











Re: Slow queries salad ;)

From
Tom Lane
Date:
PFC <lists@peufeu.com> writes:
>     The IN() is quite small (150 values), but the two large tables are
> seq-scanned... is there a way to avoid this ?

Not in 8.1.  HEAD is a bit smarter about joins to Append relations.

            regards, tom lane

Re: Slow queries salad ;)

From
"Jim C. Nasby"
Date:
On Tue, Apr 25, 2006 at 07:53:15PM +0200, PFC wrote:
What version is this??

> annonces=> EXPLAIN ANALYZE SELECT * FROM test.current WHERE id IN (SELECT
> annonce_id FROM bookmarks WHERE list_id IN ('4'));
>                                                        QUERY PLAN
>
-------------------------------------------------------------------------------------------------------------------------
>  Hash Join  (cost=6.58..532.84 rows=140 width=203) (actual
> time=0.747..5.283 rows=150 loops=1)
>    Hash Cond: ("outer".id = "inner".annonce_id)
>    ->  Seq Scan on current  (cost=0.00..467.24 rows=11524 width=203)
> (actual time=0.006..3.191 rows=11524 loops=1)
>    ->  Hash  (cost=6.23..6.23 rows=140 width=4) (actual time=0.244..0.244
> rows=150 loops=1)
>          ->  HashAggregate  (cost=4.83..6.23 rows=140 width=4) (actual
> time=0.155..0.184 rows=150 loops=1)
>                ->  Seq Scan on bookmarks  (cost=0.00..4.45 rows=150
> width=4) (actual time=0.008..0.097 rows=150 loops=1)
>                      Filter: (list_id = 4)
>  Total runtime: 5.343 ms
> (8 lignes)
>
> annonces=> set enable_hashjoin TO 0;
> SET
> annonces=> EXPLAIN ANALYZE SELECT * FROM test.current WHERE id IN (SELECT
> annonce_id FROM bookmarks WHERE list_id IN ('4'));
>                                                            QUERY PLAN
>
--------------------------------------------------------------------------------------------------------------------------------
>  Nested Loop  (cost=4.83..824.22 rows=140 width=203) (actual
> time=0.219..1.034 rows=150 loops=1)
>    ->  HashAggregate  (cost=4.83..6.23 rows=140 width=4) (actual
> time=0.158..0.199 rows=150 loops=1)
>          ->  Seq Scan on bookmarks  (cost=0.00..4.45 rows=150 width=4)
> (actual time=0.011..0.096 rows=150 loops=1)
>                Filter: (list_id = 4)
>    ->  Index Scan using current_pkey on current  (cost=0.00..5.83 rows=1
> width=203) (actual time=0.005..0.005 rows=1 loops=150)
>          Index Cond: (current.id = "outer".annonce_id)
>  Total runtime: 1.108 ms
> (7 lignes)
>
>     Hm, the row estimates on the "bookmarks" table are spot on ; why did
>     it  choose the hash join ?

Because it thought it would be cheaper; see the estimates. Increasing
effective_cache_size or decreasing random_page_cost would favor the
index scan.

>     Now, if I want to access the "all" view which contains the union of
>     the  "current" and "archive" table :
>
> annonces=> set enable_hashjoin TO 1;
> SET
> annonces=> EXPLAIN ANALYZE SELECT * FROM test.all WHERE id IN (SELECT
> annonce_id FROM bookmarks WHERE list_id IN ('4'));
>                                                            QUERY PLAN
>
--------------------------------------------------------------------------------------------------------------------------------
>  Hash Join  (cost=6.58..33484.41 rows=314397 width=36) (actual
> time=8300.484..8311.784 rows=150 loops=1)
>    Hash Cond: ("outer"."?column1?" = "inner".annonce_id)
>    ->  Append  (cost=0.00..23596.78 rows=449139 width=219) (actual
> time=6.390..8230.821 rows=448875 loops=1)
>          ->  Seq Scan on archive  (cost=0.00..18638.15 rows=437615
> width=219) (actual time=6.389..8175.491 rows=437351 loops=1)
>          ->  Seq Scan on current  (cost=0.00..467.24 rows=11524 width=203)
> (actual time=0.022..8.985 rows=11524 loops=1)
>    ->  Hash  (cost=6.23..6.23 rows=140 width=4) (actual time=0.255..0.255
> rows=150 loops=1)
>          ->  HashAggregate  (cost=4.83..6.23 rows=140 width=4) (actual
> time=0.168..0.197 rows=150 loops=1)
>                ->  Seq Scan on bookmarks  (cost=0.00..4.45 rows=150
> width=4) (actual time=0.015..0.102 rows=150 loops=1)
>                      Filter: (list_id = 4)
>  Total runtime: 8311.870 ms
> (10 lignes)
>
> annonces=> set enable_hashjoin TO 0;
> SET
> annonces=> EXPLAIN ANALYZE SELECT * FROM test.all WHERE id IN (SELECT
> annonce_id FROM bookmarks WHERE list_id IN ('4'));
>                                                              QUERY PLAN
>
-------------------------------------------------------------------------------------------------------------------------------------
>  Merge Join  (cost=79604.61..84994.98 rows=314397 width=36) (actual
> time=6944.229..7109.371 rows=150 loops=1)
>    Merge Cond: ("outer".annonce_id = "inner".id)
>    ->  Sort  (cost=11.22..11.57 rows=140 width=4) (actual
> time=0.326..0.355 rows=150 loops=1)
>          Sort Key: bookmarks.annonce_id
>          ->  HashAggregate  (cost=4.83..6.23 rows=140 width=4) (actual
> time=0.187..0.218 rows=150 loops=1)
>                ->  Seq Scan on bookmarks  (cost=0.00..4.45 rows=150
> width=4) (actual time=0.028..0.126 rows=150 loops=1)
>                      Filter: (list_id = 4)
>    ->  Sort  (cost=79593.40..80716.25 rows=449139 width=36) (actual
> time=6789.786..7014.815 rows=448625 loops=1)
>          Sort Key: "all".id
>          ->  Append  (cost=0.00..23596.78 rows=449139 width=219) (actual
> time=0.013..391.447 rows=448875 loops=1)
>                ->  Seq Scan on archive  (cost=0.00..18638.15 rows=437615
> width=219) (actual time=0.013..332.353 rows=437351 loops=1)
>                ->  Seq Scan on current  (cost=0.00..467.24 rows=11524
> width=203) (actual time=0.013..8.396 rows=11524 loops=1)
>  Total runtime: 37226.846 ms
>
>     The IN() is quite small (150 values), but the two large tables are
> seq-scanned... is there a way to avoid this ?

Possibly if you break the view apart...

SELECT ... FROM current WHERE id IN (...)
UNION ALL
SELECT ... FROM archive WHERE id IN (...)

>
-------------------------------------------------------------------------------------------------------------------------------------
>
>     Another nitpick : let's redo the first query differently.
>
> annonces=> EXPLAIN ANALYZE SELECT * FROM test.current WHERE id IN (SELECT
> annonce_id FROM bookmarks WHERE list_id IN ('4'));
>                                                        QUERY PLAN
>
-------------------------------------------------------------------------------------------------------------------------
>  Hash Join  (cost=6.58..532.84 rows=140 width=203) (actual
> time=0.794..5.791 rows=150 loops=1)
>    Hash Cond: ("outer".id = "inner".annonce_id)
>    ->  Seq Scan on current  (cost=0.00..467.24 rows=11524 width=203)
> (actual time=0.003..3.554 rows=11524 loops=1)
>    ->  Hash  (cost=6.23..6.23 rows=140 width=4) (actual time=0.265..0.265
> rows=150 loops=1)
>          ->  HashAggregate  (cost=4.83..6.23 rows=140 width=4) (actual
> time=0.179..0.210 rows=150 loops=1)
>                ->  Seq Scan on bookmarks  (cost=0.00..4.45 rows=150
> width=4) (actual time=0.021..0.102 rows=150 loops=1)
>                      Filter: (list_id = 4)
>  Total runtime: 5.853 ms
>
> annonces=> EXPLAIN ANALYZE SELECT a.* FROM test.current a, (SELECT
> DISTINCT annonce_id FROM bookmarks WHERE list_id IN ('4')) AS b WHERE
> a.id=b.annonce_id;
>                                                           QUERY PLAN
>
-------------------------------------------------------------------------------------------------------------------------------
>  Hash Join  (cost=12.37..538.63 rows=140 width=203) (actual
> time=0.812..5.362 rows=150 loops=1)
>    Hash Cond: ("outer".id = "inner".annonce_id)
>    ->  Seq Scan on current a  (cost=0.00..467.24 rows=11524 width=203)
> (actual time=0.005..3.227 rows=11524 loops=1)
>    ->  Hash  (cost=12.02..12.02 rows=140 width=4) (actual
> time=0.296..0.296 rows=150 loops=1)
>          ->  Unique  (cost=9.87..10.62 rows=140 width=4) (actual
> time=0.215..0.265 rows=150 loops=1)
>                ->  Sort  (cost=9.87..10.25 rows=150 width=4) (actual
> time=0.215..0.226 rows=150 loops=1)
>                      Sort Key: bookmarks.annonce_id
>                      ->  Seq Scan on bookmarks  (cost=0.00..4.45 rows=150
> width=4) (actual time=0.007..0.104 rows=150 loops=1)
>                            Filter: (list_id = 4)
>  Total runtime: 5.429 ms
>
>     Hm, it does Sort + Unique + Hash ; the Hash alone would have been
>     better.

The hash alone wouldn't have gotten you the same output as a DISTINCT,
though.

>     Replacing DISTINCT with GROUP BY removes the sort.
>
> annonces=> EXPLAIN ANALYZE SELECT a.* FROM test.current a, (SELECT
> annonce_id FROM bookmarks WHERE list_id IN ('4') GROUP BY annonce_id) AS b
> WHERE a.id=b.annonce_id;
>                                                        QUERY PLAN
>
-------------------------------------------------------------------------------------------------------------------------
>  Hash Join  (cost=7.98..534.24 rows=140 width=203) (actual
> time=0.811..5.557 rows=150 loops=1)
>    Hash Cond: ("outer".id = "inner".annonce_id)
>    ->  Seq Scan on current a  (cost=0.00..467.24 rows=11524 width=203)
> (actual time=0.006..3.434 rows=11524 loops=1)
>    ->  Hash  (cost=7.63..7.63 rows=140 width=4) (actual time=0.242..0.242
> rows=150 loops=1)
>          ->  HashAggregate  (cost=4.83..6.23 rows=140 width=4) (actual
> time=0.156..0.186 rows=150 loops=1)
>                ->  Seq Scan on bookmarks  (cost=0.00..4.45 rows=150
> width=4) (actual time=0.008..0.097 rows=150 loops=1)
>                      Filter: (list_id = 4)
>  Total runtime: 5.647 ms

Uhm.. that actually runs slower...
--
Jim C. Nasby, Sr. Engineering Consultant      jnasby@pervasive.com
Pervasive Software      http://pervasive.com    work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf       cell: 512-569-9461