Thread: multiple joins + Order by + LIMIT query performance issue
Hello, I have a query that runs for hours when joining 4 tables but takes milliseconds when joining one MORE table to the query. I have One big table, t_event (8 million rows) and 4 small tables (t_network,t_system,t_service, t_interface, all < 1000 rows). This query takes a few milliseconds : [code] select * from t_Event event inner join t_Service service on event.service_id=service.id inner join t_System system on service.system_id=system.id inner join t_Interface interface on system.id=interface.system_id inner join t_Network network on interface.network_id=network.id where (network.customer_id=1) order by event.c_date desc limit 25 "Limit (cost=23981.18..23981.18 rows=1 width=977)" " -> Sort (cost=23981.18..23981.18 rows=1 width=977)" " Sort Key: this_.c_date" " -> Nested Loop (cost=0.00..23981.17 rows=1 width=977)" " -> Nested Loop (cost=0.00..23974.89 rows=1 width=961)" " -> Nested Loop (cost=0.00..191.42 rows=1 width=616)" " Join Filter: (service_s3_.system_id = service1_.system_id)" " -> Nested Loop (cost=0.00..9.29 rows=1 width=576)" " -> Seq Scan on t_network service_s4_ (cost=0.00..1.01 rows=1 width=18)" " Filter: (customer_id = 1)" " -> Index Scan using interface_network_id_idx on t_interface service_s3_ (cost=0.00..8.27 rows=1 width=558)" " Index Cond: (service_s3_.network_id = service_s4_.id)" " -> Seq Scan on t_service service1_ (cost=0.00..109.28 rows=5828 width=40)" " -> Index Scan using event_svc_id_idx on t_event this_ (cost=0.00..23681.12 rows=8188 width=345)" " Index Cond: (this_.service_id = service1_.id)" " -> Index Scan using t_system_pkey on t_system service_s2_ (cost=0.00..6.27 rows=1 width=16)" " Index Cond: (service_s2_.id = service1_.system_id)" [/code] This one takes HOURS, but I'm joining one table LESS : [code] select * from t_Event event inner join t_Service service on event.service_id=service.id inner join t_System system on service.system_id=system.id inner join t_Interface interface on system.id=interface.system_id where (interface.network_id=1) order by event.c_date desc limit 25 "Limit (cost=147.79..2123.66 rows=10 width=959)" " -> Nested Loop (cost=147.79..2601774.46 rows=13167 width=959)" " Join Filter: (service1_.id = this_.service_id)" " -> Index Scan Backward using event_date_idx on t_event this_ (cost=0.00..887080.22 rows=8466896 width=345)" " -> Materialize (cost=147.79..147.88 rows=9 width=614)" " -> Hash Join (cost=16.56..147.79 rows=9 width=614)" " Hash Cond: (service1_.system_id = service_s2_.id)" " -> Seq Scan on t_service service1_ (cost=0.00..109.28 rows=5828 width=40)" " -> Hash (cost=16.55..16.55 rows=1 width=574)" " -> Nested Loop (cost=0.00..16.55 rows=1 width=574)" " -> Index Scan using interface_network_id_idx on t_interface service_s3_ (cost=0.00..8.27 rows=1 width=558)" " Index Cond: (network_id = 1)" " -> Index Scan using t_system_pkey on t_system service_s2_ (cost=0.00..8.27 rows=1 width=16)" " Index Cond: (service_s2_.id = service_s3_.system_id)" [/code] My understanding is that in the first case the sort is done after all the table joins and filtering, but in the second case ALL the rows in t_event are scanned and sorted before the join. There is an index on the sorting column. If I remove this index, the query runs very fast. But I still need this index for other queries.So I must force the planner to do the sort after the join, in the second case. How can i do that? Thanks a lot for your help, Antoine
Antoine, On Tue, May 6, 2008 at 5:03 PM, Antoine Baudoux <ab@taktik.be> wrote: > "Limit (cost=23981.18..23981.18 rows=1 width=977)" > " -> Sort (cost=23981.18..23981.18 rows=1 width=977)" > " Sort Key: this_.c_date" Can you please provide the EXPLAIN ANALYZE output instead of EXPLAIN? Thanks. -- Guillaume
Here is the explain analyse for the first query, the other is still running... explain analyse select * from t_Event event inner join t_Service service on event.service_id=service.id inner join t_System system on service.system_id=system.id inner join t_Interface interface on system.id=interface.system_id inner join t_Network network on interface.network_id=network.id where (network.customer_id=1) order by event.c_date desc limit 25 Limit (cost=11761.44..11761.45 rows=1 width=976) (actual time=0.047..0.047 rows=0 loops=1) -> Sort (cost=11761.44..11761.45 rows=1 width=976) (actual time=0.045..0.045 rows=0 loops=1) Sort Key: event.c_date Sort Method: quicksort Memory: 17kB -> Nested Loop (cost=0.00..11761.43 rows=1 width=976) (actual time=0.024..0.024 rows=0 loops=1) -> Nested Loop (cost=0.00..11755.15 rows=1 width=960) (actual time=0.024..0.024 rows=0 loops=1) -> Nested Loop (cost=0.00..191.42 rows=1 width=616) (actual time=0.024..0.024 rows=0 loops=1) Join Filter: (interface.system_id = service.system_id) -> Nested Loop (cost=0.00..9.29 rows=1 width=576) (actual time=0.023..0.023 rows=0 loops=1) -> Seq Scan on t_network network (cost=0.00..1.01 rows=1 width=18) (actual time=0.009..0.009 rows=1 loops=1) Filter: (customer_id = 1) -> Index Scan using interface_network_id_idx on t_interface interface (cost=0.00..8.27 rows=1 width=558) (actual time=0.011..0.011 rows=0 loops=1) Index Cond: (interface.network_id = network.id) -> Seq Scan on t_service service (cost=0.00..109.28 rows=5828 width=40) (never executed) -> Index Scan using event_svc_id_idx on t_event event (cost=0.00..11516.48 rows=3780 width=344) (never executed) Index Cond: (event.service_id = service.id) -> Index Scan using t_system_pkey on t_system system (cost=0.00..6.27 rows=1 width=16) (never executed) Index Cond: (system.id = service.system_id) Total runtime: 0.362 ms On May 6, 2008, at 5:38 PM, Guillaume Smet wrote: > Antoine, > > On Tue, May 6, 2008 at 5:03 PM, Antoine Baudoux <ab@taktik.be> wrote: >> "Limit (cost=23981.18..23981.18 rows=1 width=977)" >> " -> Sort (cost=23981.18..23981.18 rows=1 width=977)" >> " Sort Key: this_.c_date" > > Can you please provide the EXPLAIN ANALYZE output instead of EXPLAIN? > > Thanks. > > -- > Guillaume
On Tue, 2008-05-06 at 16:03 +0100, Antoine Baudoux wrote: > My understanding is that in the first case the sort is > done after all the table joins and filtering, but in the > second case ALL the rows in t_event are scanned and sorted > before the join. You've actually run into a problem that's bitten us in the ass a couple of times. The problem with your second query is that it's *too* efficient. You'll notice the first plan uses a bevy of nest-loops, which is very risky if the row estimates are not really really accurate. The planner says "Hey, customer_id=1 could be several rows in the t_network table, but not too many... I better check them one by one." I've turned off nest-loops sometimes to avoid queries that would run several hours due to mis-estimation, but it looks like yours was just fine. The second query says "Awesome! Only one network... I can just search the index of t_event backwards for this small result set!" But here's the rub... try your query *without* the limit clause, and you may find it's actually faster, because the planner suddenly thinks it will have to scan the whole table, so it choses an alternate plan (probably back to the nest-loop). Alternatively, take off the order-by clause, and it'll remove the slow backwards index-scan. I'm not sure what causes this, but the problem with indexes is that they're not necessarily in the order you want unless you also cluster them, so a backwards index scan is almost always the wrong answer. Personally I consider this a bug, and it's been around since at least the 8.1 tree. The only real answer is that you have a fast version of the query, so try and play with it until it acts the way you want. -- Shaun Thomas Database Administrator Leapfrog Online 807 Greenwood Street Evanston, IL 60201 Tel. 847-440-8253 Fax. 847-570-5750 www.leapfrogonline.com
Thanks a lot for your answer, there are some points I didnt understand On May 6, 2008, at 6:43 PM, Shaun Thomas wrote: > > The second query says "Awesome! Only one network... I can just search > the index of t_event backwards for this small result set!" > Shouldnt It be the opposite? considering that only a few row must be "joined" (Sorry but I'm not familiar with DBMS terms) with the t_event table, why not simply look up the corresponding rows in the t_event table using the service_id foreign key, then do the sort? Isnt the planner fooled by the index on the sorting column? If I remove the index the query runs OK. > But here's the rub... try your query *without* the limit clause, and > you > may find it's actually faster, because the planner suddenly thinks it > will have to scan the whole table, so it choses an alternate plan > (probably back to the nest-loop). Alternatively, take off the order- > by > clause, and it'll remove the slow backwards index-scan. You are right, if i remove the order-by clause It doesnt backwards index-scan. And if I remove the limit and keep the order-by clause, the backwards index-scan is gone too, and the query runs in a few millisecs!! This is crazy, so simply by adding a LIMIT to a query, the planning is changed in a very bad way. Does the planner use the LIMIT as a sort of hint? Thank you for your explanations, Antoine Baudoux
Shaun Thomas <sthomas@leapfrogonline.com> writes: > I'm not sure what causes this, but the problem with indexes is that > they're not necessarily in the order you want unless you also cluster > them, so a backwards index scan is almost always the wrong answer. Whether the scan is forwards or backwards has nothing to do with it. The planner is using the index ordering to avoid having to do a full-table scan and sort. It's essentially betting that it will find 25 (or whatever your LIMIT is) rows that satisfy the other query conditions soon enough in the index scan to make this faster than the full-scan approach. If there are a lot fewer matching rows than it expects, or if the target rows aren't uniformly scattered in the index ordering, then this way can be a loss; but when it's a win it can be a big win, too, so "it's a bug take it out" is an unhelpful opinion. If a misestimate of this kind is bugging you enough that you're willing to change the query, I think you can fix it like this: select ... from foo order by x limit n; => select ... from (select ... from foo order by x) ss limit n; The subselect will be planned without awareness of the LIMIT, so you should get a plan using a sort rather than one that bets on the LIMIT being reached quickly. regards, tom lane
Antoine Baudoux wrote: > Here is the explain analyse for the first query, the other is still > running... > > > explain analyse select * from t_Event event > inner join t_Service service on event.service_id=service.id > inner join t_System system on service.system_id=system.id > inner join t_Interface interface on system.id=interface.system_id > inner join t_Network network on interface.network_id=network.id > where (network.customer_id=1) order by event.c_date desc limit 25 > > Limit (cost=11761.44..11761.45 rows=1 width=976) (actual > time=0.047..0.047 rows=0 loops=1) > -> Sort (cost=11761.44..11761.45 rows=1 width=976) (actual > time=0.045..0.045 rows=0 loops=1) > Sort Key: event.c_date > Sort Method: quicksort Memory: 17kB > -> Nested Loop (cost=0.00..11761.43 rows=1 width=976) (actual > time=0.024..0.024 rows=0 loops=1) > -> Nested Loop (cost=0.00..11755.15 rows=1 width=960) > (actual time=0.024..0.024 rows=0 loops=1) > -> Nested Loop (cost=0.00..191.42 rows=1 > width=616) (actual time=0.024..0.024 rows=0 loops=1) > Join Filter: (interface.system_id = > service.system_id) > -> Nested Loop (cost=0.00..9.29 rows=1 > width=576) (actual time=0.023..0.023 rows=0 loops=1) > -> Seq Scan on t_network network > (cost=0.00..1.01 rows=1 width=18) (actual time=0.009..0.009 rows=1 loops=1) > Filter: (customer_id = 1) > -> Index Scan using > interface_network_id_idx on t_interface interface (cost=0.00..8.27 > rows=1 width=558) (actual time=0.011..0.011 rows=0 loops=1) > Index Cond: (interface.network_id > = network.id) > -> Seq Scan on t_service service > (cost=0.00..109.28 rows=5828 width=40) (never executed) > -> Index Scan using event_svc_id_idx on t_event > event (cost=0.00..11516.48 rows=3780 width=344) (never executed) > Index Cond: (event.service_id = service.id) > -> Index Scan using t_system_pkey on t_system system > (cost=0.00..6.27 rows=1 width=16) (never executed) > Index Cond: (system.id = service.system_id) > Total runtime: 0.362 ms Are the queries even returning the same results (except for the extra columns coming from t_network)? It looks like in this version, the network-interface join is performed first, which returns zero rows, so the rest of the joins don't need to be performed at all. That's why it's fast. Which version of PostgreSQL is this, BTW? -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
Antoine Baudoux wrote: > Here is the explain analyse for the first query, the other is still > running... > > > explain analyse select * from t_Event event > inner join t_Service service on event.service_id=service.id > inner join t_System system on service.system_id=system.id > inner join t_Interface interface on system.id=interface.system_id > inner join t_Network network on interface.network_id=network.id > where (network.customer_id=1) order by event.c_date desc limit 25 > > Limit (cost=11761.44..11761.45 rows=1 width=976) (actual > time=0.047..0.047 rows=0 loops=1) > -> Sort (cost=11761.44..11761.45 rows=1 width=976) (actual > time=0.045..0.045 rows=0 loops=1) > Sort Key: event.c_date > Sort Method: quicksort Memory: 17kB > -> Nested Loop (cost=0.00..11761.43 rows=1 width=976) (actual > time=0.024..0.024 rows=0 loops=1) > -> Nested Loop (cost=0.00..11755.15 rows=1 width=960) > (actual time=0.024..0.024 rows=0 loops=1) > -> Nested Loop (cost=0.00..191.42 rows=1 > width=616) (actual time=0.024..0.024 rows=0 loops=1) > Join Filter: (interface.system_id = > service.system_id) > -> Nested Loop (cost=0.00..9.29 rows=1 > width=576) (actual time=0.023..0.023 rows=0 loops=1) > -> Seq Scan on t_network network > (cost=0.00..1.01 rows=1 width=18) (actual time=0.009..0.009 rows=1 loops=1) > Filter: (customer_id = 1) > -> Index Scan using > interface_network_id_idx on t_interface interface (cost=0.00..8.27 > rows=1 width=558) (actual time=0.011..0.011 rows=0 loops=1) > Index Cond: (interface.network_id > = network.id) > -> Seq Scan on t_service service > (cost=0.00..109.28 rows=5828 width=40) (never executed) > -> Index Scan using event_svc_id_idx on t_event > event (cost=0.00..11516.48 rows=3780 width=344) (never executed) > Index Cond: (event.service_id = service.id) > -> Index Scan using t_system_pkey on t_system system > (cost=0.00..6.27 rows=1 width=16) (never executed) > Index Cond: (system.id = service.system_id) > Total runtime: 0.362 ms Are the queries returning the same results (except for the extra columns coming from t_network)? It looks like in this version, the network-interface join is performed first, which returns zero rows, so the rest of the joins don't need to be performed at all. That's why it's fast. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
On Tue, 2008-05-06 at 18:59 +0100, Tom Lane wrote: > Whether the scan is forwards or backwards has nothing > to do with it. The planner is using the index ordering > to avoid having to do a full-table scan and sort. Oh, I know that. I just noticed that when this happened to us, more often than not, it was a reverse index scan that did it. The thing that annoyed me most was when it happened on an index that, even on a table having 20M rows, the cardinality is < 10 on almost every value of that index. In our case, having a "LIMIT 1" was much worse than just getting back 5 or 10 rows and throwing away everything after the first one. > but when it's a win it can be a big win, too, so "it's > a bug take it out" is an unhelpful opinion. That's just it... it *can* be a big win. But when it's a loss, you're index-scanning a 20M+ row table for no reason. We got around it, obviously, but it was a definite surprise when a query that normally runs in 0.5ms time randomly and inexplicably runs at 4-120s. This is disaster for a feed loader chewing through a few ten-thousand entries. But that's just me grousing about not having query hints or being able to tell Postgres to never, ever, ever index-scan certain tables. :) -- Shaun Thomas Database Administrator Leapfrog Online 807 Greenwood Street Evanston, IL 60201 Tel. 847-440-8253 Fax. 847-570-5750 www.leapfrogonline.com
On Tue, 2008-05-06 at 18:24 +0100, Antoine Baudoux wrote: > Isnt the planner fooled by the index on the sorting column? > If I remove the index the query runs OK. In your case, for whatever reason, the stats say doing the index scan on the sorted column will give you the results faster. That isn't always the case, and sometimes you can give the same query different where clauses and that same slow-index-scan will randomly be fast. It's all based on the index distribution and the particular values being fetched. This goes back to what Tom said. If you know a "miss" can result in terrible performance, it's best to just recode the query to avoid the situation. > This is crazy, so simply by adding a LIMIT to a query, the planning is > changed in a very bad way. Does the planner use the LIMIT as a sort of > hint? Yes. That's actually what tells it the index scan can be a "big win." If it scans the index backwards on values returned from some of your joins, it may just have to find 25 rows and then it can immediately stop scanning and just give you the results. In normal cases, this is a massive performance boost when you have an order clause and are expecting a ton of results, (say you're getting the first 25 rows of 10000 or something). But if it would be faster to generate the results and *then* sort, but Postgres thinks otherwise, you're pretty much screwed. But that's the long answer. You have like 3 ways to get around this now, so pick one. ;) -- Shaun Thomas Database Administrator Leapfrog Online 807 Greenwood Street Evanston, IL 60201 Tel. 847-440-8253 Fax. 847-570-5750 www.leapfrogonline.com
> > If a misestimate of this kind is bugging you enough that you're > willing > to change the query, I think you can fix it like this: > > select ... from foo order by x limit n; > => > select ... from (select ... from foo order by x) ss limit n; > > The subselect will be planned without awareness of the LIMIT, so you > should get a plan using a sort rather than one that bets on the LIMIT > being reached quickly. I tried that, using a subquery. Unfortunately this does not change anything : select * from (select * from t_Event event inner join t_Service service on event.service_id=service.id inner join t_System system on service.system_id=system.id inner join t_Interface interface on system.id=interface.system_id where (interface.network_id=1) order by event.c_date desc ) ss limit 25 "Limit (cost=147.79..5563.93 rows=25 width=3672)" " -> Subquery Scan ss (cost=147.79..2896263.01 rows=13368 width=3672)" " -> Nested Loop (cost=147.79..2896129.33 rows=13368 width=958)" " Join Filter: (service.id = event.service_id)" " -> Index Scan Backward using event_date_idx on t_event event (cost=0.00..1160633.69 rows=8569619 width=344)" " -> Materialize (cost=147.79..147.88 rows=9 width=614)" " -> Hash Join (cost=16.56..147.79 rows=9 width=614)" " Hash Cond: (service.system_id = system.id)" " -> Seq Scan on t_service service (cost=0.00..109.28 rows=5828 width=40)" " -> Hash (cost=16.55..16.55 rows=1 width=574)" " -> Nested Loop (cost=0.00..16.55 rows=1 width=574)" " -> Index Scan using interface_network_id_idx on t_interface interface (cost=0.00..8.27 rows=1 width=558)" " Index Cond: (network_id = 1)" " -> Index Scan using t_system_pkey on t_system system (cost=0.00..8.27 rows=1 width=16)" " Index Cond: (system.id = interface.system_id)" The worst thing about all this is that there are ZERO rows to join with the t_event table. So the planner decide to index-scan 8 millions row, where there is no hope of finding a match! This seems a very ,very , very poor decision
On Tue, 6 May 2008, Tom Lane wrote: > If a misestimate of this kind is bugging you enough that you're willing > to change the query, I think you can fix it like this: > > select ... from foo order by x limit n; > => > select ... from (select ... from foo order by x) ss limit n; > > The subselect will be planned without awareness of the LIMIT, so you > should get a plan using a sort rather than one that bets on the LIMIT > being reached quickly. Surely if that's the case, that in itself is a bug? Apart from being "useful", I mean. Matthew -- "Television is a medium because it is neither rare nor well done." -- Fred Friendly
Ok, I've tried everything, and the planner keeps choosing index scans when it shouldnt. Is there a way to disable index scans? Antoine
On Fri, May 9, 2008 at 1:18 AM, Antoine Baudoux <ab@taktik.be> wrote: > Ok, I've tried everything, and the planner keeps choosing index scans when > it shouldnt. > > Is there a way to disable index scans? You can use "set enable_indexscan off;" as the first command I've had one or two reporting queries in the past that it was a necessity to do that before running certain queries on very large datasets where a seq scan would kill performance.