multiple joins + Order by + LIMIT query performance issue - Mailing list pgsql-performance

From Antoine Baudoux
Subject multiple joins + Order by + LIMIT query performance issue
Date
Msg-id 938EC908-FB9A-4BFA-950A-053BE3C87085@taktik.be
Whole thread Raw
Responses Re: multiple joins + Order by + LIMIT query performance issue
Re: multiple joins + Order by + LIMIT query performance issue
List pgsql-performance
Hello,

I have a query that runs for hours when joining 4 tables but takes
milliseconds when joining one MORE table to the query.
I have One big table, t_event (8 million rows) and 4 small tables
(t_network,t_system,t_service, t_interface, all < 1000 rows). This
query takes a few milliseconds :
[code]
select * from t_Event event
inner join t_Service service on event.service_id=service.id
inner join t_System system on service.system_id=system.id
inner join t_Interface interface on system.id=interface.system_id
inner join t_Network network on interface.network_id=network.id
where (network.customer_id=1) order by event.c_date desc limit 25

"Limit  (cost=23981.18..23981.18 rows=1 width=977)"
"  ->  Sort  (cost=23981.18..23981.18 rows=1 width=977)"
"        Sort Key: this_.c_date"
"        ->  Nested Loop  (cost=0.00..23981.17 rows=1 width=977)"
"              ->  Nested Loop  (cost=0.00..23974.89 rows=1 width=961)"
"                    ->  Nested Loop  (cost=0.00..191.42 rows=1
width=616)"
"                          Join Filter: (service_s3_.system_id =
service1_.system_id)"
"                          ->  Nested Loop  (cost=0.00..9.29 rows=1
width=576)"
"                                ->  Seq Scan on t_network
service_s4_  (cost=0.00..1.01 rows=1 width=18)"
"                                      Filter: (customer_id = 1)"
"                                ->  Index Scan using
interface_network_id_idx on t_interface service_s3_  (cost=0.00..8.27
rows=1 width=558)"
"                                      Index Cond:
(service_s3_.network_id = service_s4_.id)"
"                          ->  Seq Scan on t_service service1_
(cost=0.00..109.28 rows=5828 width=40)"
"                    ->  Index Scan using event_svc_id_idx on t_event
this_  (cost=0.00..23681.12 rows=8188 width=345)"
"                          Index Cond: (this_.service_id =
service1_.id)"
"              ->  Index Scan using t_system_pkey on t_system
service_s2_  (cost=0.00..6.27 rows=1 width=16)"
"                    Index Cond: (service_s2_.id = service1_.system_id)"
[/code]

This one takes HOURS, but I'm joining one table LESS :

[code]
select * from t_Event event
inner join t_Service service on event.service_id=service.id
inner join t_System system on service.system_id=system.id
inner join t_Interface interface on system.id=interface.system_id
where (interface.network_id=1) order by event.c_date desc limit 25

"Limit  (cost=147.79..2123.66 rows=10 width=959)"
"  ->  Nested Loop  (cost=147.79..2601774.46 rows=13167 width=959)"
"        Join Filter: (service1_.id = this_.service_id)"
"        ->  Index Scan Backward using event_date_idx on t_event
this_  (cost=0.00..887080.22 rows=8466896 width=345)"
"        ->  Materialize  (cost=147.79..147.88 rows=9 width=614)"
"              ->  Hash Join  (cost=16.56..147.79 rows=9 width=614)"
"                    Hash Cond: (service1_.system_id = service_s2_.id)"
"                    ->  Seq Scan on t_service service1_
(cost=0.00..109.28 rows=5828 width=40)"
"                    ->  Hash  (cost=16.55..16.55 rows=1 width=574)"
"                          ->  Nested Loop  (cost=0.00..16.55 rows=1
width=574)"
"                                ->  Index Scan using
interface_network_id_idx on t_interface service_s3_  (cost=0.00..8.27
rows=1 width=558)"
"                                      Index Cond: (network_id = 1)"
"                                ->  Index Scan using t_system_pkey on
t_system service_s2_  (cost=0.00..8.27 rows=1 width=16)"
"                                      Index Cond: (service_s2_.id =
service_s3_.system_id)"
[/code]

My understanding is that in the first case the sort is done after all
the table joins and filtering, but in the second case ALL the rows in
t_event are scanned and sorted before the join. There is an index on
the sorting column. If I remove this index, the query runs very fast.
But I still need this index for other queries.So I must force the
planner to do the sort after the join, in the second case. How can i
do that?

Thanks a lot for your help,

Antoine


pgsql-performance by date:

Previous
From: Frank van Vugt
Date:
Subject: Re: plan difference between set-returning function with ROWS within IN() and a plain join
Next
From: "Guillaume Smet"
Date:
Subject: Re: multiple joins + Order by + LIMIT query performance issue