Thread: A questions on planner choices
I apologize for my english and... also for the explanation perhaps not very clear. I have some doubt regarding the planner choice for my query, usually it does a very good job and I would prefer to leave free the planner but with this query I have some doubt: I use tree tables, cartellino with 2 indexes "cartellino_punto_geom_4326" gist (the_geom) "specimen_idspecie" btree (idspecie) A view named specienomi with an index on specienomi.nome Postgres 8.4.8 with postgis 1.5.3 I can post a complete explain for each query. This is the original query SELECT specienomi.nome, cartellino.cont_nome, ST_AsGML(cartellino.the_geom) FROM cartellino, specienomi, confini_regioni WHERE confini_regioni.regione='UMBRIA' AND specienomi.nome like 'Quercus%' AND cartellino.idspecie=specienomi.id AND ST_Intersects(cartellino.the_geom,confini_regioni.the_geom4326); it tooks 4481.933 ms the planner does ((cartellino join confini_regioni) join specienomi) but I think I want to try another way. A very big enhancement with: WITH temp_que AS ( SELECT specienomi.nome AS nome, cartellino.cont_nome AS cont_nome, cartellino.id AS id, the_geom FROM cartellino, specienomi WHERE specienomi.nome like 'Quercus %' AND cartellino.idspecie=specienomi.id ) SELECT temp_que.nome, temp_que.cont_nome, ST_AsGML(temp_que.the_geom) FROM temp_que, confini_regioni WHERE confini_regioni.regione='UMBRIA' AND ST_Intersects(temp_que.the_geom,confini_regioni.the_geom4326); The time is 45.026 ms the question is: I am missing some index? or ST_Intersects behaves in a way that i don't understand? after re-reading the manual I did some other try: set from_collapse_limit=1; SELECT temp_que.nome, temp_que.cont_nome, ST_AsGML(temp_que.the_geom) FROM confini_regioni, (SELECT specienomi.nome AS nome, cartellino.cont_nome AS cont_nome, cartellino.id AS id, the_geom FROM cartellino, specienomi WHERE specienomi.nome like 'Quercus %' AND cartellino.idspecie=specienomi.id ) AS temp_que WHERE confini_regioni.regione='UMBRIA' AND ST_Intersects(temp_que.the_geom,confini_regioni.the_geom4326) ORDER BY temp_que.id; works fine 50.126 ms set join_collapse_limit=1; SELECT specienomi.nome, ST_AsGML(cartellino.the_geom) FROM confini_regioni full JOIN ( cartellino full JOIN specienomi ON (cartellino.idspecie=specienomi.id)) ON ST_Intersects(cartellino.the_geom,confini_regioni.the_geom4326) WHERE confini_regioni.regione='UMBRIA' AND specienomi.nome like 'Quercus%' is slow: 5750.499 ms and NOTICE: LWGEOM_gist_joinsel called with incorrect join type thank you Edoardo [1] Plan for the firts query ------------------------------------------------------------------------------------------------------------------------------------------------------- Sort (cost=20.45..20.46 rows=1 width=931) (actual time=4457.775..4457.786 rows=76 loops=1) Sort Key: cartellino.id Sort Method: quicksort Memory: 74kB -> Hash Join (cost=8.32..20.44 rows=1 width=931) (actual time=243.679..4457.658 rows=76 loops=1) Hash Cond: (cartellino.idspecie = principale.id) -> Nested Loop (cost=0.00..9.81 rows=614 width=886) (actual time=4.094..4439.024 rows=18370 loops=1) Join Filter: _st_intersects(cartellino.the_geom, confini_regioni.the_geom4326) -> Seq Scan on confini_regioni (cost=0.00..1.25 rows=1 width=1473036) (actual time=0.017..0.021 rows=1 loops=1) Filter: ((regione)::text = 'UMBRIA'::text) -> Index Scan using cartellino_punto_geom_4326 on cartellino (cost=0.00..8.30 rows=1 width=886) (actual time=0.059..94.148 rows=32200 loops=1) Index Cond: (cartellino.the_geom && confini_regioni.the_geom4326) -> Hash (cost=8.28..8.28 rows=3 width=57) (actual time=0.392..0.392 rows=74 loops=1) -> Index Scan using i_specie_nome_specie_like on specie principale (cost=0.01..8.28 rows=3 width=57) (actual time=0.034..0.348 rows=74 loops=1) Index Cond: ((esterna_nome(ibrido, proparte, (genere)::text, [...] (cultivar)::text) ~>=~ 'Quercus'::text) AND (esterna_nome(ibrido, proparte, (genere)::text, [...] (cultivar)::text) ~<~ 'Quercut'::text)) Filter: (esterna_nome(ibrido, proparte, (genere)::text, [...] (cultivar)::text) ~~ 'Quercus%'::text) Total runtime: 4481.933 ms
On Fri, Aug 19, 2011 at 1:05 PM, Edoardo Panfili <edoardo@aspix.it> wrote: > [1] Plan for the firts query > ------------------------------------------------------------------------------------------------------------------------------------------------------- > Sort (cost=20.45..20.46 rows=1 width=931) (actual time=4457.775..4457.786 > rows=76 loops=1) > Sort Key: cartellino.id > Sort Method: quicksort Memory: 74kB > -> Hash Join (cost=8.32..20.44 rows=1 width=931) (actual > time=243.679..4457.658 rows=76 loops=1) > Hash Cond: (cartellino.idspecie = principale.id) > -> Nested Loop (cost=0.00..9.81 rows=614 width=886) (actual > time=4.094..4439.024 rows=18370 loops=1) The row estimate here is off by a factor of 30 or so. In this case a different join method would likely work better. It might be that cranking up stats for the columns involved will help, but if that doesn't change the estimates then we might need to look elsewhere. What's your work_mem and random_page_cost?
Il 19/08/11 22:15, Scott Marlowe ha scritto: > On Fri, Aug 19, 2011 at 1:05 PM, Edoardo Panfili<edoardo@aspix.it> wrote: >> [1] Plan for the firts query >> ------------------------------------------------------------------------------------------------------------------------------------------------------- >> Sort (cost=20.45..20.46 rows=1 width=931) (actual time=4457.775..4457.786 >> rows=76 loops=1) >> Sort Key: cartellino.id >> Sort Method: quicksort Memory: 74kB >> -> Hash Join (cost=8.32..20.44 rows=1 width=931) (actual >> time=243.679..4457.658 rows=76 loops=1) >> Hash Cond: (cartellino.idspecie = principale.id) >> -> Nested Loop (cost=0.00..9.81 rows=614 width=886) (actual >> time=4.094..4439.024 rows=18370 loops=1) > > The row estimate here is off by a factor of 30 or so. In this case a > different join method would likely work better. It might be that > cranking up stats for the columns involved will help, but if that > doesn't change the estimates then we might need to look elsewhere. > > What's your work_mem and random_page_cost? work_mem = 1MB random_page_cost = 4 I am using an SSD but the production system uses a standard hard disk. I did a try also with set default_statistics_target=10000; vacuum analyze cartellino; vacuum analyze specie; -- the base table for specienomi vacuum analyze confini_regioni; but is always 4617.023 ms Edoardo
On Fri, Aug 19, 2011 at 2:37 PM, Edoardo Panfili <edoardo@aspix.it> wrote: > > work_mem = 1MB > random_page_cost = 4 > > I am using an SSD but the production system uses a standard hard disk. > > I did a try also with > set default_statistics_target=10000; > vacuum analyze cartellino; > vacuum analyze specie; -- the base table for specienomi > vacuum analyze confini_regioni; > > but is always 4617.023 ms OK, try turning up work_mem for just this connection, i.e.: psql mydb set work_mem='64MB'; explain analyze select .... ; and see if you get a different plan. Often you only need a slightly higher work_mem to get a better plan. We're looking for a hash_join to occur here, which should be much much faster. After testing you can set work_mem globally in the postgresql.conf file. Try to keep it smallish, as it's per sort per connection, so usage can go up really fast with a lot of active connections and swamp your server's memory. I run a 128G memory machine with ~500 connections and have it set to 16MB.
Edoardo Panfili <edoardo@aspix.it> writes: > [ poor plan for a Postgis query with ] > Postgres 8.4.8 with postgis 1.5.3 I think that most of the issue here is poor selectivity estimation for the Postgis operations, particularly &&. I suggest that you should ask about this on the postgis mailing lists. They might well tell you to try a newer release --- they may have improved things since 1.5.3. > NOTICE: LWGEOM_gist_joinsel called with incorrect join type You should *definitely* report that to the Postgis guys, because it's a bug. regards, tom lane
Il 20/08/11 04:28, Tom Lane ha scritto: > Edoardo Panfili<edoardo@aspix.it> writes: >> [ poor plan for a Postgis query with ] >> Postgres 8.4.8 with postgis 1.5.3 > > I think that most of the issue here is poor selectivity estimation for > the Postgis operations, particularly&&. I suggest that you should ask > about this on the postgis mailing lists. They might well tell you to > try a newer release --- they may have improved things since 1.5.3. 1.5.3 is the latest stable release, I am downloading 2.0.0SVN >> NOTICE: LWGEOM_gist_joinsel called with incorrect join type > > You should *definitely* report that to the Postgis guys, because it's a > bug. I will do it Thank you Edoardo Panfili