Re: Strange explain on partitioned tables - Mailing list pgsql-performance

From Philippe Rimbault
Subject Re: Strange explain on partitioned tables
Date
Msg-id 4C48516B.90103@edd.fr
Whole thread Raw
In response to Re: Strange explain on partitioned tables  (Greg Smith <greg@2ndquadrant.com>)
List pgsql-performance
Greg,

First : thank you for you help.



On 22/07/2010 15:32, Greg Smith wrote:
> Philippe Rimbault wrote:
>> I have one thousand inherited tables like this one (with a different
>> check constraint on each) :
>
> The PostgreSQL partitioning system is aimed to support perhaps a
> hundred inherited tables.  You can expect to get poor performance on
> queries if you create 1000 of them.  That's not the cause of your
> current problem, just pointing out there's a larger design problem
> here you'll probably have to fix one day.

Right now, there is only 6 inherited tables, but for performance issue,
where are testing solutions on more partionned systeme (all work fine
except for query with order by).

>
>> EXPLAIN ANALYZE
>>     select
>>         documents.id,
>>         documents.num,
>>         sources.name,
>>         l.name
>>     from
>>         documents,
>>         locations l,
>>         sources
>>     where
>>         documents.id_source = 113 and
>>         documents.id_location=l.id and
>>         documents.id_source=sources.id
>>     order by
>>         documents.id desc
>>     limit 5;
>
> Please don't put your EXPLAIN plans surrounded in " marks; it makes it
> harder to copy out of your message to analyze them with tools.  I put
> this bad one into http://explain.depesz.com/s/XD and it notes that the
> "public.documents.id_location = l.id" search is underestimating the
> number of rows by a factor of 8.7.  You might get a better plan if you
> can get better table statistics on that column.  Did you run ANALYZE
> since the partitioning was done?  If not, that could be making this
> worse.  You might increase the amount of table statistics on this
> specific column too, not sure what would help without knowing exactly
> what's in there.
>
> Another thing you can try is suggest the optimizer not use a hash join
> here and see if it does the right thing instead; be a useful bit of
> feedback to see what that plan turns out to be.  Just put "set
> enable_hashjoin=off;" before the rest of the query, it will only
> impact that session.
>

Sorry for the output of the EXPLAIN ...

VACUUM ANALYZE have been done just before test of query.

I think that the optimizer overestimates "public.documents.id_location =
l.id" because it plan on the primary table and not the child ...
I've change statistics to 1000 for documents.id_location and result is
the same.

I've tested "set enable_hashjoin=off;" and the result is worst (sorry
i'm still using 9.0b3) :

Limit  (cost=197755.49..197755.50 rows=5 width=23) (actual
time=4187.148..4187.150 rows=5 loops=1)
   ->  Sort  (cost=197755.49..200390.32 rows=1053932 width=23) (actual
time=4187.146..4187.147 rows=5 loops=1)
         Sort Key: public.documents.id
         Sort Method:  top-N heapsort  Memory: 17kB
         ->  Nested Loop  (cost=151258.55..180250.06 rows=1053932
width=23) (actual time=1862.214..3769.611 rows=1053929 loops=1)
               ->  Index Scan using pk_sources on sources
(cost=0.00..8.27 rows=1 width=8) (actual time=0.007..0.013 rows=1 loops=1)
                     Index Cond: (id = 113)
               ->  Merge Join  (cost=151258.55..169702.47 rows=1053932
width=21) (actual time=1862.204..3360.555 rows=1053929 loops=1)
                     Merge Cond: (l.id = public.documents.id_location)
                     ->  Sort  (cost=1.75..1.81 rows=23 width=10)
(actual time=0.028..0.036 rows=21 loops=1)
                           Sort Key: l.id
                           Sort Method:  quicksort  Memory: 17kB
                           ->  Seq Scan on locations l  (cost=0.00..1.23
rows=23 width=10) (actual time=0.002..0.009 rows=23 loops=1)
                     ->  Materialize  (cost=151256.80..156526.46
rows=1053932 width=13) (actual time=1862.162..2841.302 rows=1053929 loops=1)
                           ->  Sort  (cost=151256.80..153891.63
rows=1053932 width=13) (actual time=1862.154..2290.881 rows=1053929 loops=1)
                                 Sort Key: public.documents.id_location
                                 Sort Method:  external merge  Disk: 24496kB
                                 ->  Append  (cost=0.00..27810.36
rows=1053932 width=13) (actual time=0.003..838.644 rows=1053929 loops=1)
                                       ->  Seq Scan on documents
(cost=0.00..18.25 rows=3 width=39) (actual time=0.000..0.000 rows=0 loops=1)
                                             Filter: (id_source = 113)
                                       ->  Seq Scan on documents_mond
documents  (cost=0.00..27792.11 rows=1053929 width=13) (actual
time=0.002..502.345 rows=1053929 loops=1)
                                             Filter: (id_source = 113)
Total runtime: 4198.703 ms



pgsql-performance by date:

Previous
From: Greg Smith
Date:
Subject: Re: Strange explain on partitioned tables
Next
From: Craig James
Date:
Subject: Re: Using more tha one index per table