Thread: A questions on planner choices

A questions on planner choices

From

Edoardo Panfili

Date:

19 August 2011, 16:05:38

I apologize for my english and... also for the explanation perhaps not
very clear.
I have some doubt regarding the planner choice for my query, usually it
does a very good job and I would prefer to leave free the planner but
with this query I have some doubt:

I use tree tables, cartellino with 2 indexes
     "cartellino_punto_geom_4326" gist (the_geom)
     "specimen_idspecie" btree (idspecie)
A view named specienomi with an index on specienomi.nome
Postgres 8.4.8 with postgis 1.5.3
I can post a complete explain for each query.

This is the original query
SELECT specienomi.nome, cartellino.cont_nome,
     ST_AsGML(cartellino.the_geom)
FROM cartellino, specienomi, confini_regioni
WHERE confini_regioni.regione='UMBRIA'
     AND specienomi.nome like 'Quercus%'
     AND cartellino.idspecie=specienomi.id
     AND ST_Intersects(cartellino.the_geom,confini_regioni.the_geom4326);
it tooks 4481.933 ms
the planner does ((cartellino join confini_regioni) join specienomi) but
I think I want to try another way.


A very big enhancement with:
WITH temp_que AS (
     SELECT specienomi.nome AS nome,
         cartellino.cont_nome AS cont_nome,
         cartellino.id AS id, the_geom
     FROM cartellino, specienomi
     WHERE specienomi.nome like 'Quercus %'
         AND cartellino.idspecie=specienomi.id
)
SELECT temp_que.nome, temp_que.cont_nome, ST_AsGML(temp_que.the_geom)
FROM temp_que, confini_regioni
WHERE confini_regioni.regione='UMBRIA'
AND ST_Intersects(temp_que.the_geom,confini_regioni.the_geom4326);
The time is 45.026 ms

the question is: I am missing some index? or ST_Intersects behaves in a
way that i don't understand?



after re-reading the manual I did some other try:

set from_collapse_limit=1;
SELECT temp_que.nome, temp_que.cont_nome, ST_AsGML(temp_que.the_geom)
FROM confini_regioni,
     (SELECT specienomi.nome AS nome,
         cartellino.cont_nome AS cont_nome,
         cartellino.id AS id, the_geom
      FROM cartellino, specienomi
      WHERE specienomi.nome like 'Quercus %'
         AND cartellino.idspecie=specienomi.id
     ) AS temp_que
WHERE confini_regioni.regione='UMBRIA'
AND ST_Intersects(temp_que.the_geom,confini_regioni.the_geom4326)
ORDER BY temp_que.id;

works fine 50.126 ms


set join_collapse_limit=1;
SELECT specienomi.nome, ST_AsGML(cartellino.the_geom)
FROM confini_regioni full JOIN (
     cartellino full JOIN specienomi ON
     (cartellino.idspecie=specienomi.id)) ON
     ST_Intersects(cartellino.the_geom,confini_regioni.the_geom4326)
WHERE confini_regioni.regione='UMBRIA'
AND specienomi.nome like 'Quercus%'

is slow: 5750.499 ms
and
NOTICE:  LWGEOM_gist_joinsel called with incorrect join type

thank you
Edoardo



[1] Plan for the firts query

-------------------------------------------------------------------------------------------------------------------------------------------------------
  Sort  (cost=20.45..20.46 rows=1 width=931) (actual
time=4457.775..4457.786 rows=76 loops=1)
    Sort Key: cartellino.id
    Sort Method:  quicksort  Memory: 74kB
    ->  Hash Join  (cost=8.32..20.44 rows=1 width=931) (actual
time=243.679..4457.658 rows=76 loops=1)
          Hash Cond: (cartellino.idspecie = principale.id)
          ->  Nested Loop  (cost=0.00..9.81 rows=614 width=886) (actual
time=4.094..4439.024 rows=18370 loops=1)
                Join Filter: _st_intersects(cartellino.the_geom,
confini_regioni.the_geom4326)
                ->  Seq Scan on confini_regioni  (cost=0.00..1.25 rows=1
width=1473036) (actual time=0.017..0.021 rows=1 loops=1)
                      Filter: ((regione)::text = 'UMBRIA'::text)
                ->  Index Scan using cartellino_punto_geom_4326 on
cartellino  (cost=0.00..8.30 rows=1 width=886) (actual
time=0.059..94.148 rows=32200 loops=1)
                      Index Cond: (cartellino.the_geom &&
confini_regioni.the_geom4326)
          ->  Hash  (cost=8.28..8.28 rows=3 width=57) (actual
time=0.392..0.392 rows=74 loops=1)
                ->  Index Scan using i_specie_nome_specie_like on specie
principale  (cost=0.01..8.28 rows=3 width=57) (actual time=0.034..0.348
rows=74 loops=1)
                      Index Cond: ((esterna_nome(ibrido, proparte,
(genere)::text, [...]  (cultivar)::text) ~>=~ 'Quercus'::text) AND
(esterna_nome(ibrido, proparte, (genere)::text, [...] (cultivar)::text)
~<~ 'Quercut'::text))
                      Filter: (esterna_nome(ibrido, proparte,
(genere)::text, [...] (cultivar)::text) ~~ 'Quercus%'::text)
  Total runtime: 4481.933 ms

Re: A questions on planner choices

From

Scott Marlowe

Date:

19 August 2011, 17:15:09

On Fri, Aug 19, 2011 at 1:05 PM, Edoardo Panfili <edoardo@aspix.it> wrote:
> [1] Plan for the firts query
>
-------------------------------------------------------------------------------------------------------------------------------------------------------
>  Sort  (cost=20.45..20.46 rows=1 width=931) (actual time=4457.775..4457.786
> rows=76 loops=1)
>   Sort Key: cartellino.id
>   Sort Method:  quicksort  Memory: 74kB
>   ->  Hash Join  (cost=8.32..20.44 rows=1 width=931) (actual
> time=243.679..4457.658 rows=76 loops=1)
>         Hash Cond: (cartellino.idspecie = principale.id)
>         ->  Nested Loop  (cost=0.00..9.81 rows=614 width=886) (actual
> time=4.094..4439.024 rows=18370 loops=1)

The row estimate here is off by a factor of 30 or so.  In this case a
different join method would likely work better.   It might be that
cranking up stats for the columns involved will help, but if that
doesn't change the estimates then we might need to look elsewhere.

What's your work_mem and random_page_cost?

Re: A questions on planner choices

From

Edoardo Panfili

Date:

19 August 2011, 17:38:02

Il 19/08/11 22:15, Scott Marlowe ha scritto:
> On Fri, Aug 19, 2011 at 1:05 PM, Edoardo Panfili<edoardo@aspix.it>  wrote:
>> [1] Plan for the firts query
>>
-------------------------------------------------------------------------------------------------------------------------------------------------------
>>   Sort  (cost=20.45..20.46 rows=1 width=931) (actual time=4457.775..4457.786
>> rows=76 loops=1)
>>    Sort Key: cartellino.id
>>    Sort Method:  quicksort  Memory: 74kB
>>    ->    Hash Join  (cost=8.32..20.44 rows=1 width=931) (actual
>> time=243.679..4457.658 rows=76 loops=1)
>>          Hash Cond: (cartellino.idspecie = principale.id)
>>          ->    Nested Loop  (cost=0.00..9.81 rows=614 width=886) (actual
>> time=4.094..4439.024 rows=18370 loops=1)
>
> The row estimate here is off by a factor of 30 or so.  In this case a
> different join method would likely work better.   It might be that
> cranking up stats for the columns involved will help, but if that
> doesn't change the estimates then we might need to look elsewhere.
>
> What's your work_mem and random_page_cost?
  work_mem = 1MB
  random_page_cost = 4

I am using an SSD but the production system uses a standard hard disk.

I did a try also with
set default_statistics_target=10000;
vacuum analyze cartellino;
vacuum analyze specie; -- the base table for specienomi
vacuum analyze confini_regioni;

but is always 4617.023 ms

Edoardo

Re: A questions on planner choices

From

Scott Marlowe

Date:

19 August 2011, 19:11:49

On Fri, Aug 19, 2011 at 2:37 PM, Edoardo Panfili <edoardo@aspix.it> wrote:
>
>  work_mem = 1MB
>  random_page_cost = 4
>
> I am using an SSD but the production system uses a standard hard disk.
>
> I did a try also with
> set default_statistics_target=10000;
> vacuum analyze cartellino;
> vacuum analyze specie; -- the base table for specienomi
> vacuum analyze confini_regioni;
>
> but is always 4617.023 ms

OK, try turning up work_mem for just this connection, i.e.:

psql mydb
set work_mem='64MB';
explain analyze select .... ;

and see if you get a different plan.  Often you only need a slightly
higher work_mem to get a better plan.  We're looking for a hash_join
to occur here, which should be much much faster.  After testing you
can set work_mem globally in the postgresql.conf file.  Try to keep it
smallish, as it's per sort per connection, so usage can go up really
fast with a lot of active connections and swamp your server's memory.
I run a 128G memory machine with ~500 connections and have it set to
16MB.

Re: A questions on planner choices

From

Tom Lane

Date:

19 August 2011, 23:28:22

Edoardo Panfili <edoardo@aspix.it> writes:
> [ poor plan for a Postgis query with ]
> Postgres 8.4.8 with postgis 1.5.3

I think that most of the issue here is poor selectivity estimation for
the Postgis operations, particularly &&.  I suggest that you should ask
about this on the postgis mailing lists.  They might well tell you to
try a newer release --- they may have improved things since 1.5.3.

> NOTICE:  LWGEOM_gist_joinsel called with incorrect join type

You should *definitely* report that to the Postgis guys, because it's a
bug.

            regards, tom lane

Re: A questions on planner choices

From

Edoardo Panfili

Date:

20 August 2011, 03:58:52

Il 20/08/11 04:28, Tom Lane ha scritto:
> Edoardo Panfili<edoardo@aspix.it>  writes:
>> [ poor plan for a Postgis query with ]
>> Postgres 8.4.8 with postgis 1.5.3
>
> I think that most of the issue here is poor selectivity estimation for
> the Postgis operations, particularly&&.  I suggest that you should ask
> about this on the postgis mailing lists.  They might well tell you to
> try a newer release --- they may have improved things since 1.5.3.
1.5.3 is the latest stable release, I am downloading 2.0.0SVN


>> NOTICE:  LWGEOM_gist_joinsel called with incorrect join type
>
> You should *definitely* report that to the Postgis guys, because it's a
> bug.
I will do it

Thank you
Edoardo Panfili