Thread: bad plan

bad plan

From

Julien Cigar

Date:

05 April 2012, 11:48:02

Hello,

I have an extremely bad plan for one of my colleague's query. Basically
PostgreSQL chooses to seq scan instead of index scan. This is on:

antabif=# select version();
                                                  version
----------------------------------------------------------------------------------------------------------

  PostgreSQL 9.0.7 on amd64-portbld-freebsd8.2, compiled by GCC cc (GCC)
4.2.1 20070719  [FreeBSD], 64-bit

The machines has 4GB of RAM with the following config:
- shared_buffers: 512MB
- effective_cache_size: 2GB
- work_mem: 32MB
- maintenance_work_mem: 128MB
- default_statistics_target: 300
- temp_buffers: 64MB
- wal_buffers: 8MB
- checkpoint_segments = 15

The tables have been ANALYZE'd. I've put the EXPLAIN ANALYZE on:

- http://www.pastie.org/3731956 : with default config
- http://www.pastie.org/3731960 : this is with enable_seq_scan = off
- http://www.pastie.org/3731962 : I tried to play on the various cost
settings but it's doesn't change anything, except setting
random_page_cost to 1 (which will lead to bad plans for other queries,
so not a solution)
- http://www.pastie.org/3732035 : with enable_hashagg and
enable_hashjoin to false

I'm currently out of idea why PostgreSQL still chooses a bad plan for
this query ... any hint ?

Thank you,
Julien

--
No trees were killed in the creation of this message.
However, many electrons were terribly inconvenienced.

Attachment

jcigar.vcf

Re: bad plan

From

"Kevin Grittner"

Date:

05 April 2012, 14:26:33

Julien Cigar <jcigar@ulb.ac.be> wrote:

> I tried to play on the various cost settings but it's doesn't
> change anything, except setting random_page_cost to 1 (which will
> lead to bad plans for other queries, so not a solution)

Yeah, you clearly don't have the active portion of your database
fully cached, so you don't want random_page_cost to go as low as
seq_page_cost.

Here's one suggestion to try:

random_page_cost = 2
cpu_tuple_cost = 0.05

I have found that combination to work well for me when the level of
caching is about where you're seeing it.  I am becoming increasingly
of the opinion that the default for cpu_tuple_cost should be higher
than 0.01.

Please let us know whether that helps.

-Kevin

Re: bad plan

From

Ants Aasma

Date:

05 April 2012, 19:48:47

On Thu, Apr 5, 2012 at 2:47 PM, Julien Cigar <jcigar@ulb.ac.be> wrote:
> - http://www.pastie.org/3731956 : with default config
> - http://www.pastie.org/3731960 : this is with enable_seq_scan = off

It looks like the join selectivity of (context_to_context_links,
ancestors) is being overestimated by almost two orders of magnitude.
The optimizer thinks that there are 564 rows in the
context_to_context_links table for each taxon_id, while in fact for
this query the number is 9. To confirm that this, you can force the
selectivity estimate to be 200x lower by adding a geo_id = geod_id
where clause to the subquery.

If it does help, then the next question would be why is the estimate
so much off. It could be either because the stats for
context_to_context_links.taxon_id are wrong or because
ancestors.taxon_id(subphylum_id = 18830) is a special case. To help
figuring this is out, you could run the following to queries and post
the results:

SELECT floor(log(num,2)) AS nmatch, COUNT(*) AS freq FROM (SELECT
COUNT(*) AS num FROM context_to_context_links GROUP BY taxon_id) AS
dist GROUP BY 1 ORDER BY 1;

SELECT floor(log(num,2)) AS nmatch, COUNT(*) AS freq FROM (SELECT
COUNT(*) AS num FROM context_to_context_links WHERE NOT geo_id IS NULL
and taxon_id= ANY ( select taxon_id from rab.ancestors  where
   ancestors.subphylum_id = 18830) GROUP BY taxon_id) AS dist GROUP BY
1 ORDER BY 1;

If the second distribution has a significantly different shape then
cross column statistics are necessary to get good plans. As it happens
I'm working on adding this functionality to PostgreSQL and would love
to hear more details about your use-case to understand if it would be
solved by this work.

Regards,
Ants Aasma
--
Cybertec Schönig & Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt
Web: http://www.postgresql-support.de

Re: bad plan

From

Julien Cigar

Date:

06 April 2012, 11:16:58

On 04/05/2012 21:47, Ants Aasma wrote:
> On Thu, Apr 5, 2012 at 2:47 PM, Julien Cigar<jcigar@ulb.ac.be>  wrote:
>> - http://www.pastie.org/3731956 : with default config
>> - http://www.pastie.org/3731960 : this is with enable_seq_scan = off
> It looks like the join selectivity of (context_to_context_links,
> ancestors) is being overestimated by almost two orders of magnitude.
> The optimizer thinks that there are 564 rows in the
> context_to_context_links table for each taxon_id, while in fact for
> this query the number is 9. To confirm that this, you can force the
> selectivity estimate to be 200x lower by adding a geo_id = geod_id
> where clause to the subquery.

adding a geo_id = geo_id to the subquery helped a little bit with a
cpu_tuple_cost of 0.1: http://www.pastie.org/3738224 :

without:

Index Scan using ltlc_taxon_id_idxoncontext_to_context_links  (cost=0.00..146.93  rows=341  width=8) (actual
time=0.004..0.019 rows=9  loops=736) 

with geo_id = geo_id:

Index Scan using ltlc_taxon_id_idxoncontext_to_context_links  (cost=0.00..148.11  rows=2  width=8) (actual
time=0.004..0.020 rows=9  loops=736) 


> If it does help, then the next question would be why is the estimate
> so much off. It could be either because the stats for
> context_to_context_links.taxon_id are wrong or because
> ancestors.taxon_id(subphylum_id = 18830) is a special case. To help
> figuring this is out, you could run the following to queries and post
> the results:
>
> SELECT floor(log(num,2)) AS nmatch, COUNT(*) AS freq FROM (SELECT
> COUNT(*) AS num FROM context_to_context_links GROUP BY taxon_id) AS
> dist GROUP BY 1 ORDER BY 1;
>
> SELECT floor(log(num,2)) AS nmatch, COUNT(*) AS freq FROM (SELECT
> COUNT(*) AS num FROM context_to_context_links WHERE NOT geo_id IS NULL
> and taxon_id= ANY ( select taxon_id from rab.ancestors  where
>     ancestors.subphylum_id = 18830) GROUP BY taxon_id) AS dist GROUP BY
> 1 ORDER BY 1;

I'm sorry but I get an "ERROR:  division by zero" for both of your queries..

> If the second distribution has a significantly different shape then
> cross column statistics are necessary to get good plans. As it happens
> I'm working on adding this functionality to PostgreSQL and would love
> to hear more details about your use-case to understand if it would be
> solved by this work.

Thank you for your help,
Julien

> Regards,
> Ants Aasma


--
No trees were killed in the creation of this message.
However, many electrons were terribly inconvenienced.

Attachment

jcigar.vcf