Thread: BUG #6426: Complex query runs 10 times longer with "LIMIT 20"
The following bug has been logged on the website: Bug reference: 6426 Logged by: Andrew Schetinin Email address: aschetinin@gmail.com PostgreSQL version: 9.1.2 Operating system: Debian Linux Description:=20=20=20=20=20=20=20=20 I have a complex query that, when unlimited, runs in 2.5 seconds and returns 400+ records. The same query, with added "LIMIT 20 OFFSET 0" at the end, runs 25 seconds! (and returns correct 20 records) I think there is a bug in the query optimizer. Other variations of this query (with slightly different constraints and joins) work fast with and without LIMIT.=20 The same query works fast in PostgreSQL 8.4 (tested on another machine with Ubuntu and not exactly the same but very-very similar database). I can provide the query execution plans of the two cases (as long as they are not published on the Web); but not the database because there is customer's data in it. Please let me know if anybody is interested in researching the problem. Regards, Andrew Schetinin
aschetinin@gmail.com wrote: > The following bug has been logged on the website: > > Bug reference: 6426 > Logged by: Andrew Schetinin > Email address: aschetinin@gmail.com > PostgreSQL version: 9.1.2 > Operating system: Debian Linux > Description: > > I have a complex query that, when unlimited, runs in 2.5 seconds and returns > 400+ records. > > The same query, with added "LIMIT 20 OFFSET 0" at the end, runs 25 seconds! > (and returns correct 20 records) > > I think there is a bug in the query optimizer. Other variations of this > query (with slightly different constraints and joins) work fast with and > without LIMIT. > > The same query works fast in PostgreSQL 8.4 (tested on another machine with > Ubuntu and not exactly the same but very-very similar database). > > I can provide the query execution plans of the two cases (as long as they > are not published on the Web); but not the database because there is > customer's data in it. > > Please let me know if anybody is interested in researching the problem. > > Regards, > > Andrew Schetinin > > > Hi Andrew, I posted for another post, its may give you a workaround. I still not fully understand how PG choose execute plan that slow down so much. I had the same situation in one of my query. Use the subquery can speed up almost by 100 times faster. explain analyse select FileId as "FileId", ESDT as "ESDT",1 as "Position" from V_FileMeta_L3 where Archiveset = 61000 and ESDT= 'ESDT123' and Source = 'SOURCE1234' and ( (StartTime between '2012-01-28 05:59:57.000000Z'::timestamp - '-135 minutes'::interval and '2012-01-28 07:41:27.000000Z'::timestamp + '100 days'::interval) or (EndTime between '2012-01-28 05:59:57.000000Z'::timestamp - '-135 minutes'::interval and '2012-01-28 07:41:27.000000Z'::timestamp + '100 days'::interval) ) order by starttime limit 1; Limit (cost=0.00..15.20 rows=1 width=22) (actual time=200.048..200.048 rows=1 loops=1) -> Nested Loop (cost=0.00..117596.32 rows=7736 width=22) (actual time=200.046..200.046 rows=1 loops=1) -> Index Scan using ak_filemeta_l3_esdt_starttime_endtime on filemeta_l3 b (cost=0.00..77200.55 rows=7736 width=22) (actual time=199.986..199.989 rows=2 loops=1) Index Cond: ((esdt)::text = 'ROLPT'::text) Filter: (((source)::text = 'OMPS-NPP'::text) AND (((starttime >= '2012-01-28 08:14:57'::timestamp without time zone) AND (starttime <= '2012- 05-07 07:41:27'::timestamp without time zone)) OR ((endtime >= '2012-01-28 08:14:57'::timestamp without time zone) AND (endtime <= '2012-05-07 07:41:27'::ti mestamp without time zone)))) -> Index Scan using pk_filemeta_archiveset on filemeta_archiveset a (cost=0.00..5.21 rows=1 width=4) (actual time=0.025..0.025 rows=0 loops=2) Index Cond: ((a.fileid = b.fileid) AND (a.archiveset = 61000)) Total runtime: 200.102 ms (8 rows) explain analyse select FileId as "FileId", ESDT as "ESDT",1 as "Position" from V_FileMeta_L3 where FileId in (select fileid from V_FileMeta_L3 where Archiveset = 61000 and ESDT= 'ESDT123' and Source = 'SOUCE1234' and ( (StartTime between '2012-01-28 05:59:57.000000Z'::timestamp - '-135 minutes'::interval and '2012-01-28 07:41:27.000000Z'::timestamp + '100 days'::interval) or (EndTime between '2012-01-28 05:59:57.000000Z'::timestamp - '-135 minutes'::interval and '2012-01-28 07:41:27.000000Z'::timestamp + '100 days'::interval) ) order by starttime) limit 1; Limit (cost=53038.95..53039.41 rows=1 width=14) (actual time=2.502..2.502 rows=1 loops=1) -> Nested Loop (cost=53038.95..6661196.32 rows=14611464 width=14) (actual time=2.480..2.480 rows=1 loops=1) -> Nested Loop (cost=53038.95..53716.80 rows=2549631 width=18) (actual time=2.464..2.464 rows=1 loops=1) -> HashAggregate (cost=53038.95..53040.95 rows=200 width=4) (actual time=2.445..2.445 rows=1 loops=1) -> Sort (cost=52923.05..52942.37 rows=7727 width=12) (actual time=2.346..2.372 rows=43 loops=1) Sort Key: b.starttime Sort Method: quicksort Memory: 27kB -> Nested Loop (cost=209.87..52424.05 rows=7727 width=12) (actual time=0.358..2.251 rows=43 loops=1) -> Bitmap Heap Scan on filemeta_l3 b (cost=209.87..12077.66 rows=7727 width=12) (actual time=0.262..0.685 rows=108 loops=1) Recheck Cond: ((((esdt)::text = 'ROLPT'::text) AND (starttime >= '2012-01-28 08:14:57'::timestamp without time zone) AND (starttime <= '2012-05-07 07:41:2 7'::timestamp without time zone)) OR (((esdt)::text = 'ROLPT'::text) AND (endtime >= '2012-01-28 08:14:57'::timestamp without time zone) AND (endtime <= '2012-05-07 07:41:27'::timestamp without time zone))) Filter: ((source)::text = 'OMPS-NPP'::text) -> BitmapOr (cost=209.87..209.87 rows=9895 width=0) (actual time=0.223..0.223 rows=0 loops=1) -> Bitmap Index Scan on ak_filemeta_l3_esdt_starttime_endtime (cost=0.00..107.72 rows=4961 width=0) (actual time=0.127..0.127 rows=106 loops=1) Index Cond: (((esdt)::text = 'ROLPT'::text) AND (starttime >= '2012-01-28 08:14:57'::timestamp without time zone) AND (starttime <= '2012-05-0 7 07:41:27'::timestamp without time zone)) -> Bitmap Index Scan on ak_filemeta_l3_esdt_endtime (cost=0.00..98.29 rows=4934 width=0) (actual time=0.093..0.093 rows=108 loops=1) Index Cond: (((esdt)::text = 'ROLPT'::text) AND (endtime >= '2012-01-28 08:14:57'::timestamp without time zone) AND (endtime <= '2012-05-07 07 :41:27'::timestamp without time zone)) -> Index Scan using pk_filemeta_archiveset on filemeta_archiveset a (cost=0.00..5.21 rows=1 width=4) (actual time=0.011..0.011 rows=0 loops=108) Index Cond: ((a.fileid = b.fileid) AND (a.archiveset = 61000)) -> Index Scan using pk_filemeta_l3 on filemeta_l3 b (cost=0.00..3.37 rows=1 width=14) (actual time=0.015..0.015 rows=1 loops=1) Index Cond: (b.fileid = a.fileid) -> Index Scan using ak_filemeta_archiveset_fileid on filemeta_archiveset a (cost=0.00..2.52 rows=6 width=4) (actual time=0.012..0.012 rows=1 loops=1) Index Cond: (a.fileid = b.fileid) Total runtime: 2.711 ms Hope this help. Best Regards, Alex Lai -- Best regards, Alex Lai OMI SIPS DBA ADNET Systems , Inc. 7515 Mission Drive, Suite A100 Lanham, MD 20706 301-352-4657 (phone) 301-352-0437 (fax) alai@sesda2.com
Andrew Schetinin wrote: > Hi Alex, > > Thank you. I played with subqueries, but in this case I did not see a > good enough improvement. > > In my specific case, what I've seen from the query execution plans, is > that without LIMIT the query uses Hash Joins, but once I add LIMIT, it > starts using Nested Loop Joins almost everywhere. > > Regards, > > Andrew > > On Wed, Feb 1, 2012 at 9:42 PM, Alex Lai <alai@sesda2.com > <mailto:alai@sesda2.com>> wrote: > > Hi Andrew, > I posted for another post, its may give you a workaround. > I still not fully understand how PG choose execute plan that slow > down so much. > > I had the same situation in one of my query. > Use the subquery can speed up almost by 100 times faster. > > .......... > > > > Hope this help. > > > Best Regards, > Alex Lai > > -- > Best regards, > > > Alex Lai > OMI SIPS DBA ADNET Systems , Inc. 7515 Mission Drive, Suite A100 > Lanham, MD 20706 301-352-4657 (phone) 301-352-0437 (fax) > alai@sesda2.com <mailto:alai@sesda2.com> > > > > > -- > -- > Andrew Schetinin Hi Andrew, I am interested in see your query. -- Best regards, Alex Lai OMI SIPS DBA ADNET Systems , Inc. 7515 Mission Drive, Suite A100 Lanham, MD 20706 301-352-4657 (phone) 301-352-0437 (fax) alai@sesda2.com
Hi Alex, Thank you. I played with subqueries, but in this case I did not see a good enough improvement. In my specific case, what I've seen from the query execution plans, is that without LIMIT the query uses Hash Joins, but once I add LIMIT, it starts using Nested Loop Joins almost everywhere. Regards, Andrew On Wed, Feb 1, 2012 at 9:42 PM, Alex Lai <alai@sesda2.com> wrote: > Hi Andrew, > I posted for another post, its may give you a workaround. > I still not fully understand how PG choose execute plan that slow down so > much. > > I had the same situation in one of my query. > Use the subquery can speed up almost by 100 times faster. > > .......... > > Hope this help. > > > Best Regards, > Alex Lai > > -- > Best regards, > > > Alex Lai > OMI SIPS DBA ADNET Systems , Inc. 7515 Mission Drive, Suite A100 Lanham, > MD 20706 301-352-4657 (phone) 301-352-0437 (fax) alai@sesda2.com > > -- -- Andrew Schetinin
Andrew Schetinin <aschetinin@gmail.com> writes: > In my specific case, what I've seen from the query execution plans, is that > without LIMIT the query uses Hash Joins, but once I add LIMIT, it starts > using Nested Loop Joins almost everywhere. Usually, that's an appropriate change for a small LIMIT. It's certainly not a priori evidence of a planner bug. If you want useful comments about this, please review http://wiki.postgresql.org/wiki/Slow_Query_Questions about how to provide an adequate description of your problem. regards, tom lane
Hi Tom, Thanks for pointing the FAQ out, I did not see it. I especially liked the link to http://explain.depesz.com - it's a useful tool. I succeeded to fix my problem by changing the order of JOINs (the query remained exactly the same otherwise). According to EXPLAIN ANALIZE, it eliminated those problematic nested loop joins. BTW, changing the order of JOINs did not affect the times of the query without LIMIT - it works the same way, while for the query with LIMIT it fixed the problem. Previously I always thought that the order of JOINs or conditions in WHERE is irrelevant, and query optimizer rearranges the order according to its logic. Now it appears that sometimes it may be important. Regards, Andrew On Wed, Feb 1, 2012 at 11:48 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Andrew Schetinin <aschetinin@gmail.com> writes: > > In my specific case, what I've seen from the query execution plans, is > that > > without LIMIT the query uses Hash Joins, but once I add LIMIT, it starts > > using Nested Loop Joins almost everywhere. > > Usually, that's an appropriate change for a small LIMIT. It's certainly > not a priori evidence of a planner bug. > > If you want useful comments about this, please review > http://wiki.postgresql.org/wiki/Slow_Query_Questions > about how to provide an adequate description of your problem. > > regards, tom lane > -- -- Andrew Schetinin
Andrew Schetinin <aschetinin@gmail.com> writes: > Previously I always thought that the order of JOINs or conditions in WHERE > is irrelevant, and query optimizer rearranges the order according to its > logic. Now it appears that sometimes it may be important. If you have more than join_collapse_limit JOINs in the query, the order can matter. regards, tom lane