Thread: Planner doesn't look at LIMIT?
Hello, I have PostgreSQL 8.0.3 running on a "workstation" with 768 MB of RAM, under FreeBSD. And I have a 47-milion row table: qnex=# explain select * from log; QUERY PLAN ----------------------------------------------------------------------- Seq Scan on log (cost=0.00..1741852.36 rows=47044336 width=180) (1 row) ...which is joined with a few smaller ones, like: qnex=# explain select * from useragents; QUERY PLAN ------------------------------------------------------------------- Seq Scan on useragents (cost=0.00..9475.96 rows=364896 width=96) (1 row) shared_buffers = 5000 random_page_cost = 3 work_mem = 102400 effective_cache_size = 60000 Now, if I do a SELECT: qnex=# EXPLAIN SELECT * FROM log NATURAL JOIN useragents LIMIT 1; QUERY PLAN ------------------------------------------------------------------------------------- Limit (cost=15912.20..15912.31 rows=1 width=272) -> Hash Join (cost=15912.20..5328368.96 rows=47044336 width=272) Hash Cond: ("outer".useragent_id = "inner".useragent_id) -> Seq Scan on log (cost=0.00..1741852.36 rows=47044336 width=180) -> Hash (cost=9475.96..9475.96 rows=364896 width=96) -> Seq Scan on useragents (cost=0.00..9475.96 rows=364896 width=96) (6 rows) Or: qnex=# EXPLAIN SELECT * FROM log NATURAL LEFT JOIN useragents LIMIT 1; QUERY PLAN ------------------------------------------------------------------------------------- Limit (cost=15912.20..15912.31 rows=1 width=272) -> Hash Left Join (cost=15912.20..5328368.96 rows=47044336 width=272) Hash Cond: ("outer".useragent_id = "inner".useragent_id) -> Seq Scan on log (cost=0.00..1741852.36 rows=47044336 width=180) -> Hash (cost=9475.96..9475.96 rows=364896 width=96) -> Seq Scan on useragents (cost=0.00..9475.96 rows=364896 width=96) (6 rows) Time: 2.688 ms ...the query seems to last forever (its hashing 47 million rows!) If I set enable_hashjoin=false: qnex=# EXPLAIN ANALYZE SELECT * FROM log NATURAL LEFT JOIN useragents LIMIT 1; QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------------------- Limit (cost=0.00..3.07 rows=1 width=272) (actual time=74.214..74.216 rows=1 loops=1) -> Nested Loop Left Join (cost=0.00..144295895.01 rows=47044336 width=272) (actual time=74.204..74.204 rows=1 loops=1) -> Seq Scan on log (cost=0.00..1741852.36 rows=47044336 width=180) (actual time=23.270..23.270 rows=1 loops=1) -> Index Scan using useragents_pkey on useragents (cost=0.00..3.02 rows=1 width=96) (actual time=50.867..50.867 rows=1 loops=1) Index Cond: ("outer".useragent_id = useragents.useragent_id) Total runtime: 74.483 ms ...which is way faster. Of course if I did: qnex=# EXPLAIN ANALYZE SELECT * FROM log NATURAL LEFT JOIN useragents WHERE logid = (SELECT logid FROM log LIMIT 1); QUERY PLAN ----------------------------------------------------------------------------------------------------------------------------------- Nested Loop Left Join (cost=0.04..6.09 rows=1 width=272) (actual time=61.403..61.419 rows=1 loops=1) InitPlan -> Limit (cost=0.00..0.04 rows=1 width=4) (actual time=0.029..0.032 rows=1 loops=1) -> Seq Scan on log (cost=0.00..1741852.36 rows=47044336 width=4) (actual time=0.023..0.023 rows=1 loops=1) -> Index Scan using log_pkey on log (cost=0.00..3.02 rows=1 width=180) (actual time=61.316..61.319 rows=1 loops=1) Index Cond: (logid = $0) -> Index Scan using useragents_pkey on useragents (cost=0.00..3.02 rows=1 width=96) (actual time=0.036..0.042 rows=1 loops=1) Index Cond: ("outer".useragent_id = useragents.useragent_id) Total runtime: 61.741 ms (9 rows) ...I tried tweaking cpu_*, work_mem, effective_cache and so on, but without any luck. 47 milion table is huge compared to useragents (I actually need to join the log with 3 similar to useragents tables, and create a view out of it). Also tried using LEFT/RIGHT JOINS insead of (inner) JOINs... Of course the database is freshly vacuum analyzed, and statistics are set at 50... My view of the problem is that planner ignores the "LIMIT" part. It assumes it _needs_ to return all 47 million rows joined with the useragents table, so the hashjoin is the only sane approach. But chances are that unless I'll use LIMIT 200000, the nested loop will be much faster. Any ideas how to make it work (other than rewriting the query to use subselects, use explicit id-rows, disabling hashjoin completely)? Or is this a bug? Regards, Dawid
Which row do you want ? Do you want 'a row' at random ? I presume you want the N latest rows ? In that case you should use an ORDER BY on an indexed field, the serial primary key will do nicely (ORDER BY id DESC) ; it's indexed so it will use the index and it will fly. > Any ideas how to make it work (other than rewriting the query to use > subselects, use explicit id-rows, disabling hashjoin completely)? > Or is this a bug? > > Regards, > Dawid > > ---------------------------(end of broadcast)--------------------------- > TIP 9: In versions below 8.0, the planner will ignore your desire to > choose an index scan if your joining column's datatypes do not > match >
Dawid Kuroczko <qnex42@gmail.com> writes: > qnex=# EXPLAIN SELECT * FROM log NATURAL JOIN useragents LIMIT 1; > Limit (cost=15912.20..15912.31 rows=1 width=272) > -> Hash Join (cost=15912.20..5328368.96 rows=47044336 width=272) > If I set enable_hashjoin=false: > qnex=# EXPLAIN ANALYZE SELECT * FROM log NATURAL LEFT JOIN useragents LIMIT 1; > Limit (cost=0.00..3.07 rows=1 width=272) (actual time=74.214..74.216 > rows=1 loops=1) > -> Nested Loop Left Join (cost=0.00..144295895.01 rows=47044336 > width=272) (actual time=74.204..74.204 rows=1 loops=1) This is quite strange. The nestloop plan definitely should be preferred in the context of the LIMIT, considering that it has far lower estimated cost. And it is preferred in simple tests for me. It seems there must be something specific to your installation that's causing the planner to go wrong. Can you develop a self-contained test case that behaves this way for you? I recall we saw a similar complaint a month or two back, but the complainant never followed up with anything useful for tracking down the problem. regards, tom lane
Dawid Kuroczko wrote: >work_mem = 102400 >...I tried tweaking cpu_*, work_mem, effective_cache and so on, but without >any luck. I'm hoping you didn't tweak it enough! I posted something similar this a while ago, but haven't since got around to figuring out a useful test case to send to the list. Try reducing your work_mem down to 1000 or so and things should start doing what you expect. Sam
On 7/22/05, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Dawid Kuroczko <qnex42@gmail.com> writes: > > qnex=# EXPLAIN SELECT * FROM log NATURAL JOIN useragents LIMIT 1; > > > Limit (cost=15912.20..15912.31 rows=1 width=272) > > -> Hash Join (cost=15912.20..5328368.96 rows=47044336 width=272) > > This is quite strange. The nestloop plan definitely should be preferred > in the context of the LIMIT, considering that it has far lower estimated > cost. And it is preferred in simple tests for me. It seems there must > be something specific to your installation that's causing the planner to > go wrong. Can you develop a self-contained test case that behaves this > way for you? Why, certainly. I did test it also on Gentoo Linux PostgreSQL 8.0.1 (yeah, a bit older one), but the behaviour is the same. The test looks like this: -- First lets make a "small" lookup table -- 400000 rows. CREATE TABLE lookup ( lookup_id serial PRIMARY KEY, value integer NOT NULL ); INSERT INTO lookup (value) SELECT * FROM generate_series(1, 400000); VACUUM ANALYZE lookup; -- Then lets make a huge data table... CREATE TABLE huge_data ( huge_data_id serial PRIMARY KEY, lookup_id integer NOT NULL ); INSERT INTO huge_data (lookup_id) SELECT lookup_id FROM lookup; INSERT INTO huge_data (lookup_id) SELECT lookup_id FROM huge_data; -- 800 000 INSERT INTO huge_data (lookup_id) SELECT lookup_id FROM huge_data; -- 1 600 000 INSERT INTO huge_data (lookup_id) SELECT lookup_id FROM huge_data; -- 3 200 000 INSERT INTO huge_data (lookup_id) SELECT lookup_id FROM huge_data; -- 6 400 000 INSERT INTO huge_data (lookup_id) SELECT lookup_id FROM huge_data; -- 12 800 000 -- You may want to put ANALYZE and EXPLAIN between each of these -- steps. In my cases, at 12.8 mln rows PostgreSQL seems to go for hashjoin -- in each case. YMMV, so you may try to push it up to 1024 mln rows. INSERT INTO huge_data (lookup_id) SELECT lookup_id FROM huge_data; -- 25 600 000 ANALYZE huge_data; EXPLAIN SELECT * FROM huge_data NATURAL JOIN lookup LIMIT 1; My EXPLAIN FROM Linux (SMP P-III box), with PostgreSQL 8.0.1, during making this test case: qnex=# EXPLAIN SELECT * FROM huge_data NATURAL JOIN lookup LIMIT 1; QUERY PLAN ------------------------------------------------------------------------------------ Limit (cost=0.00..3.21 rows=1 width=12) -> Nested Loop (cost=0.00..19557596.04 rows=6094777 width=12) -> Seq Scan on huge_data (cost=0.00..95372.42 rows=6399942 width=8) -> Index Scan using lookup_pkey on lookup (cost=0.00..3.02 rows=1 width=8) Index Cond: ("outer".lookup_id = lookup.lookup_id) (5 rows) Time: 4,333 ms qnex=# INSERT INTO huge_data (lookup_id) SELECT lookup_id FROM huge_data; -- 12 800 000 INSERT 0 6400000 Time: 501014,692 ms qnex=# ANALYZE huge_data; ANALYZE Time: 4243,453 ms qnex=# EXPLAIN SELECT * FROM huge_data NATURAL JOIN lookup LIMIT 1; QUERY PLAN ------------------------------------------------------------------------------- Limit (cost=11719.00..11719.09 rows=1 width=12) -> Hash Join (cost=11719.00..1212739.73 rows=12800185 width=12) Hash Cond: ("outer".lookup_id = "inner".lookup_id) -> Seq Scan on huge_data (cost=0.00..190747.84 rows=12800184 width=8) -> Hash (cost=5961.00..5961.00 rows=400000 width=8) -> Seq Scan on lookup (cost=0.00..5961.00 rows=400000 width=8) (6 rows) Regards, Dawid
I wrote: > Dawid Kuroczko <qnex42@gmail.com> writes: >> qnex=# EXPLAIN SELECT * FROM log NATURAL JOIN useragents LIMIT 1; >> Limit (cost=15912.20..15912.31 rows=1 width=272) >> -> Hash Join (cost=15912.20..5328368.96 rows=47044336 width=272) >> If I set enable_hashjoin=false: >> qnex=# EXPLAIN ANALYZE SELECT * FROM log NATURAL LEFT JOIN useragents LIMIT 1; >> Limit (cost=0.00..3.07 rows=1 width=272) (actual time=74.214..74.216 >> rows=1 loops=1) >> -> Nested Loop Left Join (cost=0.00..144295895.01 rows=47044336 >> width=272) (actual time=74.204..74.204 rows=1 loops=1) > This is quite strange. The nestloop plan definitely should be preferred > in the context of the LIMIT, considering that it has far lower estimated > cost. And it is preferred in simple tests for me. After a suitable period of contemplating my navel, I figured out what is going on here: the total costs involved are large enough that the still-fairly-high startup cost of the hash is disregarded by compare_fuzzy_path_costs(), and so the nestloop is discarded as not having any significant potential advantage in startup time. I think that this refutes the original scheme of using the same fuzz factor for both startup and total cost comparisons, and therefore propose the attached patch. Comments? regards, tom lane *** src/backend/optimizer/util/pathnode.c.orig Fri Jul 15 13:09:25 2005 --- src/backend/optimizer/util/pathnode.c Fri Jul 22 12:08:25 2005 *************** *** 98,157 **** static int compare_fuzzy_path_costs(Path *path1, Path *path2, CostSelector criterion) { - Cost fuzz; - /* ! * The fuzz factor is set at one percent of the smaller total_cost, ! * but not less than 0.01 cost units (just in case total cost is ! * zero). * * XXX does this percentage need to be user-configurable? */ - fuzz = Min(path1->total_cost, path2->total_cost) * 0.01; - fuzz = Max(fuzz, 0.01); - if (criterion == STARTUP_COST) { ! if (Abs(path1->startup_cost - path2->startup_cost) > fuzz) ! { ! if (path1->startup_cost < path2->startup_cost) ! return -1; ! else ! return +1; ! } /* * If paths have the same startup cost (not at all unlikely), * order them by total cost. */ ! if (Abs(path1->total_cost - path2->total_cost) > fuzz) ! { ! if (path1->total_cost < path2->total_cost) ! return -1; ! else ! return +1; ! } } else { ! if (Abs(path1->total_cost - path2->total_cost) > fuzz) ! { ! if (path1->total_cost < path2->total_cost) ! return -1; ! else ! return +1; ! } /* * If paths have the same total cost, order them by startup cost. */ ! if (Abs(path1->startup_cost - path2->startup_cost) > fuzz) ! { ! if (path1->startup_cost < path2->startup_cost) ! return -1; ! else ! return +1; ! } } return 0; } --- 98,138 ---- static int compare_fuzzy_path_costs(Path *path1, Path *path2, CostSelector criterion) { /* ! * We use a fuzz factor of 1% of the smaller cost. * * XXX does this percentage need to be user-configurable? */ if (criterion == STARTUP_COST) { ! if (path1->startup_cost > path2->startup_cost * 1.01) ! return +1; ! if (path2->startup_cost > path1->startup_cost * 1.01) ! return -1; /* * If paths have the same startup cost (not at all unlikely), * order them by total cost. */ ! if (path1->total_cost > path2->total_cost * 1.01) ! return +1; ! if (path2->total_cost > path1->total_cost * 1.01) ! return -1; } else { ! if (path1->total_cost > path2->total_cost * 1.01) ! return +1; ! if (path2->total_cost > path1->total_cost * 1.01) ! return -1; /* * If paths have the same total cost, order them by startup cost. */ ! if (path1->startup_cost > path2->startup_cost * 1.01) ! return +1; ! if (path2->startup_cost > path1->startup_cost * 1.01) ! return -1; } return 0; }
On Fri, 2005-07-22 at 12:20 -0400, Tom Lane wrote: > I think that this refutes the original scheme of using the same fuzz > factor for both startup and total cost comparisons, and therefore > propose the attached patch. > > Comments? Looks good. I think it explains a few other wierd perf reports also. Best Regards, Simon Riggs
Simon Riggs <simon@2ndquadrant.com> writes: > Looks good. I think it explains a few other wierd perf reports also. Could be. I went back to look at Sam Mason's report about three weeks ago, and it definitely seems to explain his issue. The "fuzzy cost comparison" logic is new in 8.0 so it hasn't had all that much testing... regards, tom lane
On 7/22/05, Tom Lane <tgl@sss.pgh.pa.us> wrote: > > This is quite strange. The nestloop plan definitely should be preferred > > in the context of the LIMIT, considering that it has far lower estimated > > cost. And it is preferred in simple tests for me. > > After a suitable period of contemplating my navel, I figured out > what is going on here: the total costs involved are large enough that > the still-fairly-high startup cost of the hash is disregarded by > compare_fuzzy_path_costs(), and so the nestloop is discarded as not > having any significant potential advantage in startup time. > > I think that this refutes the original scheme of using the same fuzz > factor for both startup and total cost comparisons, and therefore > propose the attached patch. > > Comments? Works great!!! With LIMIT below 4 000 000 rows (its 47-milion row table) it prefers nested loops, then it starts to introduce merge joins. Regards, Dawid
Tom Lane wrote: >Could be. I went back to look at Sam Mason's report about three weeks >ago, and it definitely seems to explain his issue. I've just built a patched version as well and it appears to be doing what I think is the right thing now. I.e. actually picking the plan with the lower cost. Thanks! Sam
I have a case that I though was an example of this issue, and that this patch would correct. I applied this patch to an 8.0.3 source distribution, but it didn't seem to solve my problem. In a nutshell, I have a LIMIT query where the planner seems to favor a merge join over a nested loop. I've simplified the query as much as possible: itvtrackdata3=> \d tableA Table "public.tableA" Column | Type | Modifiers --------+----------+----------- foo | bigint | not null bar | smallint | not null bap | bigint | not null bip | bigint | not null bom | bigint | not null Indexes: "idx_tableA_bip" btree (bip) WHERE (bip = 9000000000000000000::bigint) "idx_tableA_foo" btree (foo) itvtrackdata3=> \d tableB Table "tableB" Column | Type | Modifiers ---------+----------+----------- bim | bigint | not null bif | smallint | not null baf | smallint | not null bof | smallint | not null buf | smallint | not null foo | bigint | not null Indexes: "idx_tableB_bim" btree ("bim", foo) itvtrackdata3=> set default_statistics_target to 1000; SET Time: 0.448 ms itvtrackdata3=> analyze tableA; ANALYZE Time: 4237.151 ms itvtrackdata3=> analyze tableB; ANALYZE Time: 46672.939 ms itvtrackdata3=> explain analyze SELECT * FROM tableB NATURAL JOIN tableA WHERE bim>=72555896091359 AND bim<72555935412959 AND bim=bap ORDER BY bim ASC LIMIT 1; QUERY PLAN -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- Limit (cost=149626.57..252987.71 rows=1 width=50) (actual time=5684.013..5684.013 rows=1 loops=1) -> Merge Join (cost=149626.57..252987.71 rows=1 width=50) (actual time=5684.012..5684.012 rows=1 loops=1) Merge Cond: (("outer"."bim" = "inner"."bap") AND ("outer".foo = "inner".foo)) -> Index Scan using idx_tableB_bim on tableB (cost=0.00..97391.22 rows=55672 width=24) (actual time=0.017..0.059 rows=29 loops=1) Index Cond: (("bim" >= 72555896091359::bigint) AND ("bim" < 72555935412959::bigint)) -> Sort (cost=149626.57..151523.94 rows=758948 width=34) (actual time=5099.300..5442.825 rows=560856 loops=1) Sort Key: tableA."bap", tableA.foo -> Seq Scan on tableA (cost=0.00..47351.48 rows=758948 width=34) (actual time=0.021..1645.204 rows=758948 loops=1) Total runtime: 5706.655 ms (9 rows) Time: 5729.984 ms itvtrackdata3=> set enable_mergejoin to false; SET Time: 0.373 ms itvtrackdata3=> explain analyze SELECT * FROM tableB NATURAL JOIN tableA WHERE bim>=72555896091359 AND bim<72555935412959 AND bim=bap ORDER BY bim ASC LIMIT 1; QUERY PLAN -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- Limit (cost=0.00..432619.68 rows=1 width=50) (actual time=11.149..11.150 rows=1 loops=1) -> Nested Loop (cost=0.00..432619.68 rows=1 width=50) (actual time=11.148..11.148 rows=1 loops=1) Join Filter: ("outer"."bim" = "inner"."bap") -> Index Scan using idx_tableB_bim on tableB (cost=0.00..97391.22 rows=55672 width=24) (actual time=0.017..0.062 rows=29 loops=1) Index Cond: (("bim" >= 72555896091359::bigint) AND ("bim" < 72555935412959::bigint)) -> Index Scan using idx_tableA_foo on tableA (cost=0.00..6.01 rows=1 width=34) (actual time=0.007..0.379 rows=1 loops=29) Index Cond: ("outer".foo = tableA.foo) Total runtime: 11.215 ms (8 rows) Time: 32.007 ms Have I just flubbed the patch, or is there something else going on here? Thanks, --Ian On Fri, 2005-07-22 at 12:20, Tom Lane wrote: > I wrote: > > Dawid Kuroczko <qnex42@gmail.com> writes: > >> qnex=# EXPLAIN SELECT * FROM log NATURAL JOIN useragents LIMIT 1; > > >> Limit (cost=15912.20..15912.31 rows=1 width=272) > >> -> Hash Join (cost=15912.20..5328368.96 rows=47044336 width=272) > > >> If I set enable_hashjoin=false: > > >> qnex=# EXPLAIN ANALYZE SELECT * FROM log NATURAL LEFT JOIN useragents LIMIT 1; > > >> Limit (cost=0.00..3.07 rows=1 width=272) (actual time=74.214..74.216 > >> rows=1 loops=1) > >> -> Nested Loop Left Join (cost=0.00..144295895.01 rows=47044336 > >> width=272) (actual time=74.204..74.204 rows=1 loops=1) > > > This is quite strange. The nestloop plan definitely should be preferred > > in the context of the LIMIT, considering that it has far lower estimated > > cost. And it is preferred in simple tests for me. > > After a suitable period of contemplating my navel, I figured out > what is going on here: the total costs involved are large enough that > the still-fairly-high startup cost of the hash is disregarded by > compare_fuzzy_path_costs(), and so the nestloop is discarded as not > having any significant potential advantage in startup time. > > I think that this refutes the original scheme of using the same fuzz > factor for both startup and total cost comparisons, and therefore > propose the attached patch. > > Comments? > > regards, tom lane > > *** src/backend/optimizer/util/pathnode.c.orig Fri Jul 15 13:09:25 2005 > --- src/backend/optimizer/util/pathnode.c Fri Jul 22 12:08:25 2005 > *************** > *** 98,157 **** > static int > compare_fuzzy_path_costs(Path *path1, Path *path2, CostSelector criterion) > { > - Cost fuzz; > - > /* > ! * The fuzz factor is set at one percent of the smaller total_cost, > ! * but not less than 0.01 cost units (just in case total cost is > ! * zero). > * > * XXX does this percentage need to be user-configurable? > */ > - fuzz = Min(path1->total_cost, path2->total_cost) * 0.01; > - fuzz = Max(fuzz, 0.01); > - > if (criterion == STARTUP_COST) > { > ! if (Abs(path1->startup_cost - path2->startup_cost) > fuzz) > ! { > ! if (path1->startup_cost < path2->startup_cost) > ! return -1; > ! else > ! return +1; > ! } > > /* > * If paths have the same startup cost (not at all unlikely), > * order them by total cost. > */ > ! if (Abs(path1->total_cost - path2->total_cost) > fuzz) > ! { > ! if (path1->total_cost < path2->total_cost) > ! return -1; > ! else > ! return +1; > ! } > } > else > { > ! if (Abs(path1->total_cost - path2->total_cost) > fuzz) > ! { > ! if (path1->total_cost < path2->total_cost) > ! return -1; > ! else > ! return +1; > ! } > > /* > * If paths have the same total cost, order them by startup cost. > */ > ! if (Abs(path1->startup_cost - path2->startup_cost) > fuzz) > ! { > ! if (path1->startup_cost < path2->startup_cost) > ! return -1; > ! else > ! return +1; > ! } > } > return 0; > } > --- 98,138 ---- > static int > compare_fuzzy_path_costs(Path *path1, Path *path2, CostSelector criterion) > { > /* > ! * We use a fuzz factor of 1% of the smaller cost. > * > * XXX does this percentage need to be user-configurable? > */ > if (criterion == STARTUP_COST) > { > ! if (path1->startup_cost > path2->startup_cost * 1.01) > ! return +1; > ! if (path2->startup_cost > path1->startup_cost * 1.01) > ! return -1; > > /* > * If paths have the same startup cost (not at all unlikely), > * order them by total cost. > */ > ! if (path1->total_cost > path2->total_cost * 1.01) > ! return +1; > ! if (path2->total_cost > path1->total_cost * 1.01) > ! return -1; > } > else > { > ! if (path1->total_cost > path2->total_cost * 1.01) > ! return +1; > ! if (path2->total_cost > path1->total_cost * 1.01) > ! return -1; > > /* > * If paths have the same total cost, order them by startup cost. > */ > ! if (path1->startup_cost > path2->startup_cost * 1.01) > ! return +1; > ! if (path2->startup_cost > path1->startup_cost * 1.01) > ! return -1; > } > return 0; > } > > ---------------------------(end of broadcast)--------------------------- > TIP 3: Have you checked our extensive FAQ? > > http://www.postgresql.org/docs/faq
Ian Westmacott <ianw@intellivid.com> writes: > In a nutshell, I have a LIMIT query where the planner > seems to favor a merge join over a nested loop. The planner is already estimating only one row out of the join, and so the LIMIT doesn't affect its cost estimates at all. It appears to me that the reason the nestloop plan is fast is just chance: a suitable matching row is found very early in the scan of tableB, so that the indexscan on it can stop after 29 rows, instead of having to go through all 55000 rows in the given range of bim. If it'd have had to go through, say, half of the rows to find a match, the sort/merge plan would show up a lot better. If this wasn't chance, but was expected because there are many matching rows and not only one, then there's a statistical problem. regards, tom lane
On Wed, 2005-08-10 at 18:55, Tom Lane wrote: > Ian Westmacott <ianw@intellivid.com> writes: > > In a nutshell, I have a LIMIT query where the planner > > seems to favor a merge join over a nested loop. > > The planner is already estimating only one row out of the join, and so > the LIMIT doesn't affect its cost estimates at all. > > It appears to me that the reason the nestloop plan is fast is just > chance: a suitable matching row is found very early in the scan of > tableB, so that the indexscan on it can stop after 29 rows, instead > of having to go through all 55000 rows in the given range of bim. > If it'd have had to go through, say, half of the rows to find a match, > the sort/merge plan would show up a lot better. Oh, I see. Thanks, that clears up some misconceptions I had about the explain output. > If this wasn't chance, but was expected because there are many matching > rows and not only one, then there's a statistical problem. Well, there are in fact almost 300 of them in this case. So I guess what I need to do is give the planner more information to correctly predict that. Thanks, --Ian