Thread: REPOST: Nested loops row estimates always too high
(SORRY FOR THE REPOST, I DON'T SEE MY ORIGINAL QUESTION OR ANY ANSWERS HERE) I am noticing that my queries are spending a lot of time in nested loops. The table/index row estimates are not bad, but the nested loops can be off by a factor of 50. In any case, they are always too high. Are the over-estimations below significant, and if so, is this an indication of a general configuration problem? Carlo select pp.provider_id, pp.provider_practice_id, nearby.distance from mdx_core.provider_practice as pp join mdx_core.facility as f on f.facility_id = pp.facility_id join (select * from mdx_core.zips_in_mile_range('08820', 10) where zip > '') as nearby on f.default_country_code = 'US' and f.default_postal_code = nearby.zip and pp.facility_address_id is NULL union select pp.provider_id, pp.provider_practice_id, nearby.distance from mdx_core.provider_practice as pp join mdx_core.facility_address as fa on fa.facility_address_id = pp.facility_address_id join mdx_core.address as a on a.address_id = fa.address_id join (select * from mdx_core.zips_in_mile_range('08820', 10) where zip > '') as nearby on a.country_code = 'US' and a.postal_code = nearby.zip Unique (cost=67605.91..67653.18 rows=4727 width=16) (actual time=8634.618..8637.918 rows=907 loops=1) -> Sort (cost=67605.91..67617.73 rows=4727 width=16) (actual time=8634.615..8635.651 rows=907 loops=1) Sort Key: provider_id, provider_practice_id, distance -> Append (cost=0.00..67317.41 rows=4727 width=16) (actual time=176.056..8632.429 rows=907 loops=1) -> Nested Loop (cost=0.00..38947.07 rows=3143 width=16) (actual time=176.054..7867.962 rows=872 loops=1) -> Nested Loop (cost=0.00..11520.79 rows=8121 width=12) (actual time=169.372..3041.010 rows=907 loops=1) -> Function Scan on zips_in_mile_range (cost=0.00..15.00 rows=333 width=40) (actual time=151.479..151.671 rows=66 loops=1) Filter: (zip > ''::text) -> Index Scan using facility_country_postal_code_idx on facility f (cost=0.00..34.25 rows=24 width=15) (actual time=4.969..43.740 rows=14 loops=66) Index Cond: ((f.default_country_code = 'US'::bpchar) AND ((f.default_postal_code)::text = zips_in_mile_range.zip)) -> Index Scan using provider_practice_facility_idx on provider_practice pp (cost=0.00..3.36 rows=1 width=12) (actual time=4.915..5.316 rows=1 loops=907) Index Cond: (f.facility_id = pp.facility_id) Filter: (facility_address_id IS NULL) -> Nested Loop (cost=0.00..28323.07 rows=1584 width=16) (actual time=170.310..762.472 rows=35 loops=1) -> Nested Loop (cost=0.00..7791.77 rows=1579 width=12) (actual time=170.289..612.579 rows=36 loops=1) -> Nested Loop (cost=0.00..2595.96 rows=712 width=12) (actual time=167.017..354.261 rows=29 loops=1) -> Function Scan on zips_in_mile_range (cost=0.00..15.00 rows=333 width=40) (actual time=150.188..150.312 rows=66 loops=1) Filter: (zip > ''::text) -> Index Scan using address_country_postal_code_address_idx on address a (cost=0.00..7.73 rows=2 width=17) (actual time=2.483..3.086 rows=0 loops=66) Index Cond: ((a.country_code = 'US'::bpchar) AND ((a.postal_code)::text = zips_in_mile_range.zip)) -> Index Scan using facility_address_address_idx on facility_address fa (cost=0.00..7.15 rows=12 width=8) (actual time=7.652..8.901 rows=1 loops=29) Index Cond: (a.address_id = fa.address_id) -> Index Scan using provider_practice_facility_address_idx on provider_practice pp (cost=0.00..12.80 rows=16 width=12) (actual time=4.156..4.158 rows=1 loops=36) Index Cond: (fa.facility_address_id = pp.facility_address_id) Total runtime: 8639.066 ms
On Thu, 2007-09-20 at 11:02 -0400, Carlo Stonebanks wrote: > (SORRY FOR THE REPOST, I DON'T SEE MY ORIGINAL QUESTION OR ANY ANSWERS HERE) > > I am noticing that my queries are spending a lot of time in nested loops. > The table/index row estimates are not bad, but the nested loops can be off > by a factor of 50. In any case, they are always too high. > > Are the over-estimations below significant, and if so, is this an indication > of a general configuration problem? Sounds much like the issue I was seeing as well. > > Unique (cost=67605.91..67653.18 rows=4727 width=16) (actual > time=8634.618..8637.918 rows=907 loops=1) You can to rewrite the queries to individual queries to see it if helps. In my case, I was doing select a.a,b.b,c.c from (select a from x where) a <--- Put as a SRF left join ( select b from y where ) b <--- Put as a SRF on a.a = b.a
Has anyone offered any answers to you? No one else has replied to this post. "Ow Mun Heng" <Ow.Mun.Heng@wdc.com> wrote in message news:1190616376.17050.51.camel@neuromancer.home.net... > On Thu, 2007-09-20 at 11:02 -0400, Carlo Stonebanks wrote: >> (SORRY FOR THE REPOST, I DON'T SEE MY ORIGINAL QUESTION OR ANY ANSWERS >> HERE) >> >> I am noticing that my queries are spending a lot of time in nested loops. >> The table/index row estimates are not bad, but the nested loops can be >> off >> by a factor of 50. In any case, they are always too high. >> >> Are the over-estimations below significant, and if so, is this an >> indication >> of a general configuration problem? > Sounds much like the issue I was seeing as well. > >> >> Unique (cost=67605.91..67653.18 rows=4727 width=16) (actual >> time=8634.618..8637.918 rows=907 loops=1) > > You can to rewrite the queries to individual queries to see it if helps. > > In my case, I was doing > > select a.a,b.b,c.c from > (select a from x where) a <--- Put as a SRF > left join ( > select b from y where ) b <--- Put as a SRF > on a.a = b.a > > > > > ---------------------------(end of broadcast)--------------------------- > TIP 6: explain analyze is your friend >
On Mon, 2007-09-24 at 14:12 -0400, Carlo Stonebanks wrote: > Has anyone offered any answers to you? No one else has replied to this post. Overestimate of selectivity. I guess it's mainly due to my one to many table relationships. I've tried everything from concatenated join columns and indexing it to creating all sorts of indexes and splitting the (1) tables into multiple tables and upping the indexes to 1000 and turning of nestloops/enabling geqo/ tweaking the threshold/effort and much much more (as much as I was asked to/suggested to) but still no luck. In my case, the individual queries were fast. So, In then end, I made a SRF and used the SRFs to join each other. This worked better. > > > "Ow Mun Heng" <Ow.Mun.Heng@wdc.com> wrote in message > news:1190616376.17050.51.camel@neuromancer.home.net... > > On Thu, 2007-09-20 at 11:02 -0400, Carlo Stonebanks wrote: > >> (SORRY FOR THE REPOST, I DON'T SEE MY ORIGINAL QUESTION OR ANY ANSWERS > >> HERE) > >> > >> I am noticing that my queries are spending a lot of time in nested loops. > >> The table/index row estimates are not bad, but the nested loops can be > >> off > >> by a factor of 50. In any case, they are always too high. > >> > >> Are the over-estimations below significant, and if so, is this an > >> indication > >> of a general configuration problem? > > Sounds much like the issue I was seeing as well. > > > >> > >> Unique (cost=67605.91..67653.18 rows=4727 width=16) (actual > >> time=8634.618..8637.918 rows=907 loops=1) > > > > You can to rewrite the queries to individual queries to see it if helps. > > > > In my case, I was doing > > > > select a.a,b.b,c.c from > > (select a from x where) a <--- Put as a SRF > > left join ( > > select b from y where ) b <--- Put as a SRF > > on a.a = b.a > > > > > > > > > > ---------------------------(end of broadcast)--------------------------- > > TIP 6: explain analyze is your friend > > > > > ---------------------------(end of broadcast)--------------------------- > TIP 7: You can help support the PostgreSQL project by donating at > > http://www.postgresql.org/about/donate
My problem is that I think that SRF's are causing my problems. The SRF's gets an automatic row estimate of 1000 rows. Add a condition to it, the planner guesses 333 rows. Even at 333, this is an overestimate of the number of rows returned. I'm really disappointed - SRF's are a great way to place the enterprise's db-centric business logic at the server. Carlo -----Original Message----- From: Ow Mun Heng [mailto:Ow.Mun.Heng@wdc.com] Sent: September 24, 2007 8:51 PM To: Carlo Stonebanks Cc: pgsql-performance@postgresql.org Subject: Re: [PERFORM] REPOST: Nested loops row estimates always too high On Mon, 2007-09-24 at 14:12 -0400, Carlo Stonebanks wrote: > Has anyone offered any answers to you? No one else has replied to this post. Overestimate of selectivity. I guess it's mainly due to my one to many table relationships. I've tried everything from concatenated join columns and indexing it to creating all sorts of indexes and splitting the (1) tables into multiple tables and upping the indexes to 1000 and turning of nestloops/enabling geqo/ tweaking the threshold/effort and much much more (as much as I was asked to/suggested to) but still no luck. In my case, the individual queries were fast. So, In then end, I made a SRF and used the SRFs to join each other. This worked better. > > > "Ow Mun Heng" <Ow.Mun.Heng@wdc.com> wrote in message > news:1190616376.17050.51.camel@neuromancer.home.net... > > On Thu, 2007-09-20 at 11:02 -0400, Carlo Stonebanks wrote: > >> (SORRY FOR THE REPOST, I DON'T SEE MY ORIGINAL QUESTION OR ANY ANSWERS > >> HERE) > >> > >> I am noticing that my queries are spending a lot of time in nested loops. > >> The table/index row estimates are not bad, but the nested loops can be > >> off > >> by a factor of 50. In any case, they are always too high. > >> > >> Are the over-estimations below significant, and if so, is this an > >> indication > >> of a general configuration problem? > > Sounds much like the issue I was seeing as well. > > > >> > >> Unique (cost=67605.91..67653.18 rows=4727 width=16) (actual > >> time=8634.618..8637.918 rows=907 loops=1) > > > > You can to rewrite the queries to individual queries to see it if helps. > > > > In my case, I was doing > > > > select a.a,b.b,c.c from > > (select a from x where) a <--- Put as a SRF > > left join ( > > select b from y where ) b <--- Put as a SRF > > on a.a = b.a > > > > > > > > > > ---------------------------(end of broadcast)--------------------------- > > TIP 6: explain analyze is your friend > > > > > ---------------------------(end of broadcast)--------------------------- > TIP 7: You can help support the PostgreSQL project by donating at > > http://www.postgresql.org/about/donate
On Tue, 2007-09-25 at 00:53 -0400, Carlo Stonebanks wrote: > My problem is that I think that SRF's are causing my problems. The SRF's > gets an automatic row estimate of 1000 rows. That's correct. That's what I see too though I may return 10K rows of data. (min 10 columns) But It's way faster than the normal joins I do. > I'm really disappointed - SRF's are a great way to place the enterprise's > db-centric business logic at the server. Actually, I think in general, nested Loops, while evil, are just going to be around. Even in MSSQL, when I'm pulling from, the nested loops are many and I presume it's cos of the 8x SMP and the multiGB ram which is making the query better. > > Carlo > > -----Original Message----- > From: Ow Mun Heng [mailto:Ow.Mun.Heng@wdc.com] > Sent: September 24, 2007 8:51 PM > To: Carlo Stonebanks > Cc: pgsql-performance@postgresql.org > Subject: Re: [PERFORM] REPOST: Nested loops row estimates always too high > > On Mon, 2007-09-24 at 14:12 -0400, Carlo Stonebanks wrote: > > Has anyone offered any answers to you? No one else has replied to this > post. > > Overestimate of selectivity. I guess it's mainly due to my one to many > table relationships. I've tried everything from concatenated join > columns and indexing it to creating all sorts of indexes and splitting > the (1) tables into multiple tables and upping the indexes to 1000 and > turning of nestloops/enabling geqo/ tweaking the threshold/effort and > much much more (as much as I was asked to/suggested to) but still no > luck. > > In my case, the individual queries were fast. So, In then end, I made a > SRF and used the SRFs to join each other. This worked better. > > > > > > > > "Ow Mun Heng" <Ow.Mun.Heng@wdc.com> wrote in message > > news:1190616376.17050.51.camel@neuromancer.home.net... > > > On Thu, 2007-09-20 at 11:02 -0400, Carlo Stonebanks wrote: > > >> (SORRY FOR THE REPOST, I DON'T SEE MY ORIGINAL QUESTION OR ANY ANSWERS > > >> HERE) > > >> > > >> I am noticing that my queries are spending a lot of time in nested > loops. > > >> The table/index row estimates are not bad, but the nested loops can be > > >> off > > >> by a factor of 50. In any case, they are always too high. > > >> > > >> Are the over-estimations below significant, and if so, is this an > > >> indication > > >> of a general configuration problem? > > > Sounds much like the issue I was seeing as well. > > > > > >> > > >> Unique (cost=67605.91..67653.18 rows=4727 width=16) (actual > > >> time=8634.618..8637.918 rows=907 loops=1) > > > > > > You can to rewrite the queries to individual queries to see it if helps. > > > > > > In my case, I was doing > > > > > > select a.a,b.b,c.c from > > > (select a from x where) a <--- Put as a SRF > > > left join ( > > > select b from y where ) b <--- Put as a SRF > > > on a.a = b.a > > > > > > > > > > > > > > > ---------------------------(end of broadcast)--------------------------- > > > TIP 6: explain analyze is your friend > > > > > > > > > ---------------------------(end of broadcast)--------------------------- > > TIP 7: You can help support the PostgreSQL project by donating at > > > > http://www.postgresql.org/about/donate > >
On Tue, Sep 25, 2007 at 12:53:55AM -0400, Carlo Stonebanks wrote: > My problem is that I think that SRF's are causing my problems. The SRF's > gets an automatic row estimate of 1000 rows. Add a condition to it, the > planner guesses 333 rows. Even at 333, this is an overestimate of the number > of rows returned. > > I'm really disappointed - SRF's are a great way to place the enterprise's > db-centric business logic at the server. Fortunately, in 8.3 you can attach a row estimate to the function yourself, which should most likely fix your problem. Look forward to the first beta :-) /* Steinar */ -- Homepage: http://www.sesse.net/
On Tue, 2007-09-25 at 11:31 +0200, Steinar H. Gunderson wrote: > On Tue, Sep 25, 2007 at 12:53:55AM -0400, Carlo Stonebanks wrote: > > My problem is that I think that SRF's are causing my problems. The SRF's > > gets an automatic row estimate of 1000 rows. Add a condition to it, the > > planner guesses 333 rows. Even at 333, this is an overestimate of the number > > of rows returned. > > > > I'm really disappointed - SRF's are a great way to place the enterprise's > > db-centric business logic at the server. > > Fortunately, in 8.3 you can attach a row estimate to the function yourself, > which should most likely fix your problem. Look forward to the first beta :-) > Where can I erad more about this new "feature"?
Ow Mun Heng <Ow.Mun.Heng@wdc.com> writes: > Where can I erad more about this new "feature"? http://developer.postgresql.org/pgdocs/postgres/sql-createfunction.html http://developer.postgresql.org/pgdocs/postgres/ always has a current snapshot of CVS-HEAD documentation... regards, tom lane
On Wed, 2007-09-26 at 00:02 -0400, Tom Lane wrote: > Ow Mun Heng <Ow.Mun.Heng@wdc.com> writes: > > Where can I erad more about this new "feature"? > > http://developer.postgresql.org/pgdocs/postgres/sql-createfunction.html > > http://developer.postgresql.org/pgdocs/postgres/ always has a current > snapshot of CVS-HEAD documentation... I read these two items ... execution_cost A positive number giving the estimated execution cost for the function, in units of cpu_operator_cost. If the function returns a set, this is the cost per returned row. If the cost is not specified, 1 unit is assumed for C-language and internal functions, and 100 units for functions in all other languages. Larger values cause the planner to try to avoid evaluating the function more often than necessary. result_rows A positive number giving the estimated number of rows that the planner should expect the function to return. This is only allowed when the function is declared to return a set. The default assumption is 1000 rows. ... [/snip]