Re: Planner tuning - Mailing list pgsql-general

From Alban Hertroys
Subject Re: Planner tuning
Date
Msg-id 45FFBB61.6090606@magproductions.nl
Whole thread Raw
In response to Re: Planner tuning  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: Planner tuning  ("Dmitry Koterov" <dmitry@koterov.ru>)
List pgsql-general
Tom Lane wrote:
> Alban Hertroys <alban@magproductions.nl> writes:
>> It seems pretty obvious that the planner underestimates the cost of
>> nestloops here, is there some way to tweak this?
>
> The real problem is the factor-of-a-thousand underestimate of the size
> of this join:

Good observation, I missed that one. Thanks.

>>    ->  Nested Loop  (cost=0.00..281.74 rows=2 width=14) (actual time=0.068..14.000 rows=1683 loops=1)
>>          ->  Index Scan using fewo_location_ancestry_full_idx on fewo_location_ancestry ancestor  (cost=0.00..49.34
rows=9width=4) (actual time=0.024..0.172 rows=41 loops=1) 
>>                Index Cond: ((ancestor_id = 309) AND (ancestor_type_id = 12) AND (child_type_id = 10))
>>          ->  Index Scan using fewo_property_location_country_location_idx on fewo_property_location
property_location (cost=0.00..25.80 rows=2 width=18) (actual time=0.009..0.169 rows=41 loops=41) 
>>                Index Cond: ((property_location.country_id = 300) AND ("outer".child_id =
property_location.location_id))
>>                Filter: (property_state_id = 3)
>
> Have you got up-to-date ANALYZE stats for both of these tables?
> Maybe increasing the statistics targets for them would help.

Yes. This is as of this moment a mostly static development database that
has been vacuumed and analyzed quite recently.

> You may be kind of stuck because of the lack of cross-column statistics
> --- I suppose these columns are probably rather highly correlated ---
> but you should at least try pulling the levers you've got.
>
> One thought is that country_id is probably entirely determined by
> location_id, and possibly ancestor_type_id is determined by ancestor_id.

Actually property.location_id refers to cities, which is the deepest
level in the represented data. Country_id is the top level.

Ancestry id, type and child id, type are indeed closely related. I
changed their representation based on your suggestions.

> If so you should be leaving them out of the queries and indexes;
> they're not doing anything for you except fooling the planner about the
> net selectivity of the conditions.

I tried a few things, but it seems I am quite successful at fooling the
planner...

I changed the indices on our ancestry table to not combine id and type
on the same half of the join; which is something we're in fact never
interested in anyway. This seems to have helped some indeed.

I tried removing country_id from the equation, but I haven't had the
patience to wait for the explain analyzes to complete that way - they
take long.
I implemented it this way as an optimization; I decided to join
property_location with both period_type_property and
property_availability_month using (country_id, property_id) as FK.
That quickly narrows down the number of matching records in those
tables, which an index on property_id only somehow didn't accomplish.

The good news is that I get results under 1s without having to
explicitly sort my subquery results.
The bad news is that the estimated row counts are still quite a bit off.
I analyzed the DB just before generating the attached result.

--
Alban Hertroys
alban@magproductions.nl

magproductions b.v.

T: ++31(0)534346874
F: ++31(0)534346876
M:
I: www.magproductions.nl
A: Postbus 416
   7500 AK Enschede

// Integrate Your World //

QUERYPLAN
         

----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=6780.04..6780.42 rows=1 width=182) (actual time=629.652..629.653 rows=1 loops=1)
   ->  Nested Loop  (cost=1053.78..6779.62 rows=1 width=182) (actual time=160.662..595.059 rows=969 loops=1)
         ->  Hash Join  (cost=1053.78..6688.46 rows=21 width=186) (actual time=160.493..545.222 rows=3522 loops=1)
               Hash Cond: ("outer".property_id = "inner".property_id)
               ->  GroupAggregate  (cost=0.00..5581.97 rows=3500 width=12) (actual time=0.159..363.108 rows=3522
loops=1)
                     ->  Index Scan using fewo_property_availability_month_country_property_idx on
fewo_property_availability_monthproperty_availability_month  (cost=0.00..3893.52 rows=34992 width=12) (actual
time=0.023..147.269rows=37316 loops=1) 
                           Index Cond: (300 = country_id)
               ->  Hash  (cost=1053.16..1053.16 rows=250 width=86) (actual time=160.277..160.277 rows=3522 loops=1)
                     ->  Hash Join  (cost=825.03..1053.16 rows=250 width=86) (actual time=115.767..147.429 rows=3522
loops=1)
                           Hash Cond: ("outer".property_id = "inner".property_id)
                           ->  HashAggregate  (cost=338.96..488.83 rows=2141 width=12) (actual time=64.207..77.280
rows=3522loops=1) 
                                 ->  Bitmap Heap Scan on fewo_period_type_property period_type_property
(cost=23.03..253.01rows=3438 width=12) (actual time=0.625..27.199 rows=3522 loops=1) 
                                       Recheck Cond: (300 = country_id)
                                       ->  Bitmap Index Scan on fewo_period_type_property_country_property_idx
(cost=0.00..23.03rows=3438 width=0) (actual time=0.605..0.605 rows=3522 loops=1) 
                                             Index Cond: (300 = country_id)
                           ->  Hash  (cost=473.87..473.87 rows=4881 width=18) (actual time=51.496..51.496 rows=4873
loops=1)
                                 ->  Bitmap Heap Scan on fewo_property_location property_location  (cost=50.19..473.87
rows=4881width=18) (actual time=0.974..24.530 rows=4873 loops=1) 
                                       Recheck Cond: (country_id = 300)
                                       Filter: (property_state_id = 3)
                                       ->  Bitmap Index Scan on fewo_property_location_country_property_idx
(cost=0.00..50.19rows=4912 width=0) (actual time=0.939..0.939 rows=4873 loops=1) 
                                             Index Cond: (country_id = 300)
         ->  Index Scan using fewo_location_ancestry_ancestor_child_idx on fewo_location_ancestry ancestor
(cost=0.00..4.33rows=1 width=4) (actual time=0.007..0.008 rows=0 loops=3522) 
               Index Cond: ((ancestor.ancestor_id = 309) AND (ancestor.child_id = "outer".location_id))
 Total runtime: 631.858 ms
(24 rows)


pgsql-general by date:

Previous
From: Howard Cole
Date:
Subject: Re: TSearch2 Problems
Next
From: Lars Haugseth
Date:
Subject: Re: passing passords to pgsql/pg_create/pg_dump programmatically