Workaround for cross column stats dependency - Mailing list pgsql-performance
From | Guillaume Smet |
---|---|
Subject | Workaround for cross column stats dependency |
Date | |
Msg-id | 1d4e0c10801221657s6b345fcdqd1c0b3f20e519316@mail.gmail.com Whole thread Raw |
Responses |
Re: Workaround for cross column stats dependency
|
List | pgsql-performance |
Hi -performance, While testing 8.3, I found this query which is equally slow on 8.1 and 8.3 and seems to be really slow for a not so complex query. The stats are as good as possible and the behaviour of PostgreSQL seems to be logical considering the stats but I'm looking for a workaround to speed up this query. So here is the original query: cityvox_prod=# EXPLAIN ANALYZE SELECT vq.codequar, vq.liblong, vq.libcourt FROM lieu l, vilquartier vq, rubtylieu rtl, genrelieu gl, lieugelieu lgl WHERE l.codequar = vq.codequar AND l.dfinvalidlieu is null AND vq.codevil = 'MUC' AND lgl.numlieu = l.numlieu AND lgl.codegelieu = gl.codegelieu AND gl.codetylieu = rtl.codetylieu AND rtl.codeth = 'RES' -- the interesting part is here GROUP BY vq.codequar, vq.liblong, vq.libcourt, vq.flagintramuros ORDER BY vq.flagintramuros, vq.liblong; QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ Sort (cost=2773.18..2773.25 rows=26 width=43) (actual time=602.822..602.829 rows=13 loops=1) Sort Key: vq.flagintramuros, vq.liblong Sort Method: quicksort Memory: 26kB -> HashAggregate (cost=2772.31..2772.57 rows=26 width=43) (actual time=602.769..602.778 rows=13 loops=1) -> Hash Join (cost=2737.48..2769.83 rows=248 width=43) (actual time=601.999..602.580 rows=167 loops=1) Hash Cond: ((vq.codequar)::text = (l.codequar)::text) -> Bitmap Heap Scan on vilquartier vq (cost=1.91..27.50 rows=47 width=43) (actual time=0.032..0.067 rows=47 loops=1) Recheck Cond: ((codevil)::text = 'MUC'::text) -> Bitmap Index Scan on idx_vilquartier_codevil (cost=0.00..1.90 rows=47 width=0) (actual time=0.023..0.023 rows=47 loops=1) Index Cond: ((codevil)::text = 'MUC'::text) -> Hash (cost=2646.25..2646.25 rows=7145 width=5) (actual time=601.955..601.955 rows=62526 loops=1) -> Nested Loop (cost=0.00..2646.25 rows=*7145* width=5) (actual time=0.058..548.618 rows=*62526* loops=1) -> Nested Loop (cost=0.00..349.71 rows=7232 width=4) (actual time=0.049..147.221 rows=66292 loops=1) -> Nested Loop (cost=0.00..8.59 rows=13 width=4) (actual time=0.027..0.254 rows=88 loops=1) -> Seq Scan on rubtylieu rtl (cost=0.00..2.74 rows=1 width=4) (actual time=0.013..0.026 rows=1 loops=1) Filter: ((codeth)::text = 'RES'::text) -> Index Scan using ind_genrelieu2 on genrelieu gl (cost=0.00..5.68 rows=*14* width=8) (actual time=0.013..0.119 rows=*88* loops=1) Index Cond: ((gl.codetylieu)::text = (rtl.codetylieu)::text) -> Index Scan using idx_lieugelieu_codegelieu on lieugelieu lgl (cost=0.00..18.33 rows=633 width=8) (actual time=0.014..0.802 rows=753 loops=88) Index Cond: ((lgl.codegelieu)::text = (gl.codegelieu)::text) -> Index Scan using pk_lieu on lieu l (cost=0.00..0.31 rows=1 width=9) (actual time=0.003..0.004 rows=1 loops=66292) Index Cond: (l.numlieu = lgl.numlieu) Filter: (l.dfinvalidlieu IS NULL) Total runtime: 602.930 ms The query is looking for parts of the city where we can find restaurants. The problem of this query is that we have several categories (here genrelieu) for a given type (here rubtylieu) and we have several types for a given theme (here the theme is codeth = 'RES'). When the value of the theme is RES (restaurants), we have only 1 type (it's also RES). The fact is that there are a lot of rows for the value RES in genrelieu (all the types of food and so on) compared to other values. Considering that PostgreSQL doesn't have the value of RES when building the stats for genrelieu, 14 rows is an expected value considering the distribution of the values in genrelieu so there's nothing really wrong here. But it's really really slow. If I remove the join on rubtylieu and I inject directly the value obtained in the query, the stats are exact and I get a far better plan: cityvox_prod=# EXPLAIN ANALYZE SELECT vq.codequar, vq.liblong, vq.libcourt FROM lieu l, vilquartier vq, genrelieu gl, lieugelieu lgl WHERE l.codequar = vq.codequar AND l.dfinvalidlieu is null AND vq.codevil = 'MUC' AND lgl.numlieu = l.numlieu AND lgl.codegelieu = gl.codegelieu AND gl.codetylieu = 'RES' GROUP BY vq.codequar, vq.liblong, vq.libcourt, vq.flagintramuros ORDER BY vq.flagintramuros, vq.liblong; QUERY PLAN --------------------------------------------------------------------------------------------------------------------------------------------------------------------------- Sort (cost=7070.53..7070.59 rows=26 width=43) (actual time=8.502..8.511 rows=13 loops=1) Sort Key: vq.flagintramuros, vq.liblong Sort Method: quicksort Memory: 26kB -> HashAggregate (cost=7069.65..7069.91 rows=26 width=43) (actual time=8.451..8.458 rows=13 loops=1) -> Hash Join (cost=10.06..7053.22 rows=1643 width=43) (actual time=0.318..8.267 rows=167 loops=1) Hash Cond: ((lgl.codegelieu)::text = (gl.codegelieu)::text) -> Nested Loop (cost=2.63..7001.77 rows=7358 width=47) (actual time=0.098..7.361 rows=973 loops=1) -> Nested Loop (cost=2.63..5527.36 rows=4319 width=47) (actual time=0.084..2.574 rows=630 loops=1) -> Index Scan using idx_vilquartier_codevil on vilquartier vq (cost=0.00..34.06 rows=47 width=43) (actual time=0.023..0.062 rows=47 loops=1) Index Cond: ((codevil)::text = 'MUC'::text) -> Bitmap Heap Scan on lieu l (cost=2.63..115.05 rows=146 width=9) (actual time=0.014..0.035 rows=13 loops=47) Recheck Cond: ((l.codequar)::text = (vq.codequar)::text) Filter: (l.dfinvalidlieu IS NULL) -> Bitmap Index Scan on lieu_i_codequar (cost=0.00..2.59 rows=146 width=0) (actual time=0.010..0.010 rows=13 loops=47) Index Cond: ((l.codequar)::text = (vq.codequar)::text) -> Index Scan using idx_lieugelieu_numlieu_principal on lieugelieu lgl (cost=0.00..0.32 rows=2 width=8) (actual time=0.003..0.005 rows=2 loops=630) Index Cond: (lgl.numlieu = l.numlieu) -> Hash (cost=6.33..6.33 rows=88 width=4) (actual time=0.153..0.153 rows=88 loops=1) -> Bitmap Heap Scan on genrelieu gl (cost=2.23..6.33 rows=*88* width=4) (actual time=0.028..0.088 rows=*88* loops=1) Recheck Cond: ((codetylieu)::text = 'RES'::text) -> Bitmap Index Scan on ind_genrelieu2 (cost=0.00..2.21 rows=88 width=0) (actual time=0.021..0.021 rows=88 loops=1) Index Cond: ((codetylieu)::text = 'RES'::text) Total runtime: 8.589 ms So the question is: is there any way to improve the results of the original query, other than doing a first query in the application to get the list of types and inject them in a second query (the one just above)? Thanks. -- Guillaume
pgsql-performance by date: