Re: multiple joins + Order by + LIMIT query performance issue - Mailing list pgsql-performance

From Antoine Baudoux
Subject Re: multiple joins + Order by + LIMIT query performance issue
Date
Msg-id A97B7728-FCF9-4F66-AA64-E3F81B31F9F8@taktik.be
Whole thread Raw
In response to Re: multiple joins + Order by + LIMIT query performance issue  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-performance
>
> If a misestimate of this kind is bugging you enough that you're
> willing
> to change the query, I think you can fix it like this:
>
>     select ... from foo order by x limit n;
> =>
>     select ... from (select ... from foo order by x) ss limit n;
>
> The subselect will be planned without awareness of the LIMIT, so you
> should get a plan using a sort rather than one that bets on the LIMIT
> being reached quickly.

I tried that, using a subquery. Unfortunately this does not change
anything :

select * from (select * from t_Event event
inner join t_Service service on event.service_id=service.id
inner join t_System system on service.system_id=system.id
inner join t_Interface interface on system.id=interface.system_id
where (interface.network_id=1) order by event.c_date desc ) ss limit 25

"Limit  (cost=147.79..5563.93 rows=25 width=3672)"
"  ->  Subquery Scan ss  (cost=147.79..2896263.01 rows=13368
width=3672)"
"        ->  Nested Loop  (cost=147.79..2896129.33 rows=13368
width=958)"
"              Join Filter: (service.id = event.service_id)"
"              ->  Index Scan Backward using event_date_idx on t_event
event  (cost=0.00..1160633.69 rows=8569619 width=344)"
"              ->  Materialize  (cost=147.79..147.88 rows=9 width=614)"
"                    ->  Hash Join  (cost=16.56..147.79 rows=9
width=614)"
"                          Hash Cond: (service.system_id = system.id)"
"                          ->  Seq Scan on t_service service
(cost=0.00..109.28 rows=5828 width=40)"
"                          ->  Hash  (cost=16.55..16.55 rows=1
width=574)"
"                                ->  Nested Loop  (cost=0.00..16.55
rows=1 width=574)"
"                                      ->  Index Scan using
interface_network_id_idx on t_interface interface  (cost=0.00..8.27
rows=1 width=558)"
"                                            Index Cond: (network_id =
1)"
"                                      ->  Index Scan using
t_system_pkey on t_system system  (cost=0.00..8.27 rows=1 width=16)"
"                                            Index Cond: (system.id =
interface.system_id)"


The worst thing about all this is that there are ZERO rows to join
with the t_event table. So the planner decide to index-scan 8 millions
row, where there is no hope of finding a match!
This seems a very ,very , very poor decision

pgsql-performance by date:

Previous
From: "Josh Cole"
Date:
Subject: Re: pgfouine - commit details?
Next
From: "Albe Laurenz *EXTERN*"
Date:
Subject: Re: RAID 10 Benchmark with different I/O schedulers