Thread: REPOST: Nested loops row estimates always too high

REPOST: Nested loops row estimates always too high

From
"Carlo Stonebanks"
Date:
(SORRY FOR THE REPOST, I DON'T SEE MY ORIGINAL QUESTION OR ANY ANSWERS HERE)

I am noticing that my queries are spending a lot of time in nested loops.
The table/index row estimates are not bad, but the nested loops can be off
by a factor of 50. In any case, they are always too high.

Are the over-estimations below significant, and if so, is this an indication
of a general configuration problem?

Carlo


select
   pp.provider_id,
   pp.provider_practice_id,
   nearby.distance
from mdx_core.provider_practice as pp
join mdx_core.facility as f
on f.facility_id = pp.facility_id
join (select * from mdx_core.zips_in_mile_range('08820', 10) where zip > '')
as nearby
on f.default_country_code = 'US'
   and f.default_postal_code = nearby.zip
   and pp.facility_address_id is NULL
union select
   pp.provider_id,
   pp.provider_practice_id,
   nearby.distance
from mdx_core.provider_practice as pp
join mdx_core.facility_address as fa
on fa.facility_address_id = pp.facility_address_id
join mdx_core.address as a
on a.address_id = fa.address_id
join (select * from mdx_core.zips_in_mile_range('08820', 10) where zip > '')
as nearby
on a.country_code = 'US'
and a.postal_code = nearby.zip

Unique  (cost=67605.91..67653.18 rows=4727 width=16) (actual
time=8634.618..8637.918 rows=907 loops=1)
  ->  Sort  (cost=67605.91..67617.73 rows=4727 width=16) (actual
time=8634.615..8635.651 rows=907 loops=1)
        Sort Key: provider_id, provider_practice_id, distance
        ->  Append  (cost=0.00..67317.41 rows=4727 width=16) (actual
time=176.056..8632.429 rows=907 loops=1)
              ->  Nested Loop  (cost=0.00..38947.07 rows=3143 width=16)
(actual time=176.054..7867.962 rows=872 loops=1)
                    ->  Nested Loop  (cost=0.00..11520.79 rows=8121
width=12) (actual time=169.372..3041.010 rows=907 loops=1)
                          ->  Function Scan on zips_in_mile_range
(cost=0.00..15.00 rows=333 width=40) (actual time=151.479..151.671 rows=66
loops=1)
                                Filter: (zip > ''::text)
                          ->  Index Scan using
facility_country_postal_code_idx on facility f  (cost=0.00..34.25 rows=24
width=15) (actual time=4.969..43.740 rows=14 loops=66)
                                Index Cond: ((f.default_country_code =
'US'::bpchar) AND ((f.default_postal_code)::text = zips_in_mile_range.zip))
                    ->  Index Scan using provider_practice_facility_idx on
provider_practice pp  (cost=0.00..3.36 rows=1 width=12) (actual
time=4.915..5.316 rows=1 loops=907)
                          Index Cond: (f.facility_id = pp.facility_id)
                          Filter: (facility_address_id IS NULL)
              ->  Nested Loop  (cost=0.00..28323.07 rows=1584 width=16)
(actual time=170.310..762.472 rows=35 loops=1)
                    ->  Nested Loop  (cost=0.00..7791.77 rows=1579 width=12)
(actual time=170.289..612.579 rows=36 loops=1)
                          ->  Nested Loop  (cost=0.00..2595.96 rows=712
width=12) (actual time=167.017..354.261 rows=29 loops=1)
                                ->  Function Scan on zips_in_mile_range
(cost=0.00..15.00 rows=333 width=40) (actual time=150.188..150.312 rows=66
loops=1)
                                      Filter: (zip > ''::text)
                                ->  Index Scan using
address_country_postal_code_address_idx on address a  (cost=0.00..7.73
rows=2 width=17) (actual time=2.483..3.086 rows=0 loops=66)
                                      Index Cond: ((a.country_code =
'US'::bpchar) AND ((a.postal_code)::text = zips_in_mile_range.zip))
                          ->  Index Scan using facility_address_address_idx
on facility_address fa  (cost=0.00..7.15 rows=12 width=8) (actual
time=7.652..8.901 rows=1 loops=29)
                                Index Cond: (a.address_id = fa.address_id)
                    ->  Index Scan using
provider_practice_facility_address_idx on provider_practice pp
(cost=0.00..12.80 rows=16 width=12) (actual time=4.156..4.158 rows=1
loops=36)
                          Index Cond: (fa.facility_address_id =
pp.facility_address_id)
Total runtime: 8639.066 ms



Re: REPOST: Nested loops row estimates always too high

From
Ow Mun Heng
Date:
On Thu, 2007-09-20 at 11:02 -0400, Carlo Stonebanks wrote:
> (SORRY FOR THE REPOST, I DON'T SEE MY ORIGINAL QUESTION OR ANY ANSWERS HERE)
>
> I am noticing that my queries are spending a lot of time in nested loops.
> The table/index row estimates are not bad, but the nested loops can be off
> by a factor of 50. In any case, they are always too high.
>
> Are the over-estimations below significant, and if so, is this an indication
> of a general configuration problem?
Sounds much like the issue I was seeing as well.

>
> Unique  (cost=67605.91..67653.18 rows=4727 width=16) (actual
> time=8634.618..8637.918 rows=907 loops=1)

You can to rewrite the queries to individual queries to see it if helps.

In my case, I was doing

select a.a,b.b,c.c from
(select a from x where) a <--- Put as a SRF
left join (
select b from y where ) b <--- Put as a SRF
on a.a = b.a




Re: REPOST: Nested loops row estimates always too high

From
"Carlo Stonebanks"
Date:
Has anyone offered any answers to you? No one else has replied to this post.


"Ow Mun Heng" <Ow.Mun.Heng@wdc.com> wrote in message
news:1190616376.17050.51.camel@neuromancer.home.net...
> On Thu, 2007-09-20 at 11:02 -0400, Carlo Stonebanks wrote:
>> (SORRY FOR THE REPOST, I DON'T SEE MY ORIGINAL QUESTION OR ANY ANSWERS
>> HERE)
>>
>> I am noticing that my queries are spending a lot of time in nested loops.
>> The table/index row estimates are not bad, but the nested loops can be
>> off
>> by a factor of 50. In any case, they are always too high.
>>
>> Are the over-estimations below significant, and if so, is this an
>> indication
>> of a general configuration problem?
> Sounds much like the issue I was seeing as well.
>
>>
>> Unique  (cost=67605.91..67653.18 rows=4727 width=16) (actual
>> time=8634.618..8637.918 rows=907 loops=1)
>
> You can to rewrite the queries to individual queries to see it if helps.
>
> In my case, I was doing
>
> select a.a,b.b,c.c from
> (select a from x where) a <--- Put as a SRF
> left join (
> select b from y where ) b <--- Put as a SRF
> on a.a = b.a
>
>
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 6: explain analyze is your friend
>


Re: REPOST: Nested loops row estimates always too high

From
Ow Mun Heng
Date:
On Mon, 2007-09-24 at 14:12 -0400, Carlo Stonebanks wrote:
> Has anyone offered any answers to you? No one else has replied to this post.

Overestimate of selectivity. I guess it's mainly due to my one to many
table relationships. I've tried everything from concatenated join
columns and indexing it to creating all sorts of indexes and splitting
the (1) tables into multiple tables and upping the indexes to 1000 and
turning of nestloops/enabling geqo/ tweaking the threshold/effort and
much much more (as much as I was asked to/suggested to) but still no
luck.

In my case, the individual queries were fast. So, In then end, I made a
SRF and used the SRFs to join each other. This worked better.


>
>
> "Ow Mun Heng" <Ow.Mun.Heng@wdc.com> wrote in message
> news:1190616376.17050.51.camel@neuromancer.home.net...
> > On Thu, 2007-09-20 at 11:02 -0400, Carlo Stonebanks wrote:
> >> (SORRY FOR THE REPOST, I DON'T SEE MY ORIGINAL QUESTION OR ANY ANSWERS
> >> HERE)
> >>
> >> I am noticing that my queries are spending a lot of time in nested loops.
> >> The table/index row estimates are not bad, but the nested loops can be
> >> off
> >> by a factor of 50. In any case, they are always too high.
> >>
> >> Are the over-estimations below significant, and if so, is this an
> >> indication
> >> of a general configuration problem?
> > Sounds much like the issue I was seeing as well.
> >
> >>
> >> Unique  (cost=67605.91..67653.18 rows=4727 width=16) (actual
> >> time=8634.618..8637.918 rows=907 loops=1)
> >
> > You can to rewrite the queries to individual queries to see it if helps.
> >
> > In my case, I was doing
> >
> > select a.a,b.b,c.c from
> > (select a from x where) a <--- Put as a SRF
> > left join (
> > select b from y where ) b <--- Put as a SRF
> > on a.a = b.a
> >
> >
> >
> >
> > ---------------------------(end of broadcast)---------------------------
> > TIP 6: explain analyze is your friend
> >
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 7: You can help support the PostgreSQL project by donating at
>
>                 http://www.postgresql.org/about/donate

Re: REPOST: Nested loops row estimates always too high

From
"Carlo Stonebanks"
Date:
My problem is that I think that SRF's are causing my problems. The SRF's
gets an automatic row estimate of 1000 rows. Add a condition to it, the
planner guesses 333 rows. Even at 333, this is an overestimate of the number
of rows returned.

I'm really disappointed - SRF's are a great way to place the enterprise's
db-centric business logic at the server.

Carlo

-----Original Message-----
From: Ow Mun Heng [mailto:Ow.Mun.Heng@wdc.com]
Sent: September 24, 2007 8:51 PM
To: Carlo Stonebanks
Cc: pgsql-performance@postgresql.org
Subject: Re: [PERFORM] REPOST: Nested loops row estimates always too high

On Mon, 2007-09-24 at 14:12 -0400, Carlo Stonebanks wrote:
> Has anyone offered any answers to you? No one else has replied to this
post.

Overestimate of selectivity. I guess it's mainly due to my one to many
table relationships. I've tried everything from concatenated join
columns and indexing it to creating all sorts of indexes and splitting
the (1) tables into multiple tables and upping the indexes to 1000 and
turning of nestloops/enabling geqo/ tweaking the threshold/effort and
much much more (as much as I was asked to/suggested to) but still no
luck.

In my case, the individual queries were fast. So, In then end, I made a
SRF and used the SRFs to join each other. This worked better.


>
>
> "Ow Mun Heng" <Ow.Mun.Heng@wdc.com> wrote in message
> news:1190616376.17050.51.camel@neuromancer.home.net...
> > On Thu, 2007-09-20 at 11:02 -0400, Carlo Stonebanks wrote:
> >> (SORRY FOR THE REPOST, I DON'T SEE MY ORIGINAL QUESTION OR ANY ANSWERS
> >> HERE)
> >>
> >> I am noticing that my queries are spending a lot of time in nested
loops.
> >> The table/index row estimates are not bad, but the nested loops can be
> >> off
> >> by a factor of 50. In any case, they are always too high.
> >>
> >> Are the over-estimations below significant, and if so, is this an
> >> indication
> >> of a general configuration problem?
> > Sounds much like the issue I was seeing as well.
> >
> >>
> >> Unique  (cost=67605.91..67653.18 rows=4727 width=16) (actual
> >> time=8634.618..8637.918 rows=907 loops=1)
> >
> > You can to rewrite the queries to individual queries to see it if helps.
> >
> > In my case, I was doing
> >
> > select a.a,b.b,c.c from
> > (select a from x where) a <--- Put as a SRF
> > left join (
> > select b from y where ) b <--- Put as a SRF
> > on a.a = b.a
> >
> >
> >
> >
> > ---------------------------(end of broadcast)---------------------------
> > TIP 6: explain analyze is your friend
> >
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 7: You can help support the PostgreSQL project by donating at
>
>                 http://www.postgresql.org/about/donate



Re: REPOST: Nested loops row estimates always too high

From
Ow Mun Heng
Date:
On Tue, 2007-09-25 at 00:53 -0400, Carlo Stonebanks wrote:
> My problem is that I think that SRF's are causing my problems. The SRF's
> gets an automatic row estimate of 1000 rows.

That's correct. That's what I see too though I may return 10K rows of
data. (min 10 columns)
But It's way faster than the normal joins I do.

> I'm really disappointed - SRF's are a great way to place the enterprise's
> db-centric business logic at the server.

Actually, I think in general, nested Loops, while evil, are just going
to be around. Even in MSSQL, when I'm pulling from, the nested loops are
many and I presume it's cos of the 8x SMP and the multiGB ram which is
making the query better.


>
> Carlo
>
> -----Original Message-----
> From: Ow Mun Heng [mailto:Ow.Mun.Heng@wdc.com]
> Sent: September 24, 2007 8:51 PM
> To: Carlo Stonebanks
> Cc: pgsql-performance@postgresql.org
> Subject: Re: [PERFORM] REPOST: Nested loops row estimates always too high
>
> On Mon, 2007-09-24 at 14:12 -0400, Carlo Stonebanks wrote:
> > Has anyone offered any answers to you? No one else has replied to this
> post.
>
> Overestimate of selectivity. I guess it's mainly due to my one to many
> table relationships. I've tried everything from concatenated join
> columns and indexing it to creating all sorts of indexes and splitting
> the (1) tables into multiple tables and upping the indexes to 1000 and
> turning of nestloops/enabling geqo/ tweaking the threshold/effort and
> much much more (as much as I was asked to/suggested to) but still no
> luck.
>
> In my case, the individual queries were fast. So, In then end, I made a
> SRF and used the SRFs to join each other. This worked better.
>
>
> >
> >
> > "Ow Mun Heng" <Ow.Mun.Heng@wdc.com> wrote in message
> > news:1190616376.17050.51.camel@neuromancer.home.net...
> > > On Thu, 2007-09-20 at 11:02 -0400, Carlo Stonebanks wrote:
> > >> (SORRY FOR THE REPOST, I DON'T SEE MY ORIGINAL QUESTION OR ANY ANSWERS
> > >> HERE)
> > >>
> > >> I am noticing that my queries are spending a lot of time in nested
> loops.
> > >> The table/index row estimates are not bad, but the nested loops can be
> > >> off
> > >> by a factor of 50. In any case, they are always too high.
> > >>
> > >> Are the over-estimations below significant, and if so, is this an
> > >> indication
> > >> of a general configuration problem?
> > > Sounds much like the issue I was seeing as well.
> > >
> > >>
> > >> Unique  (cost=67605.91..67653.18 rows=4727 width=16) (actual
> > >> time=8634.618..8637.918 rows=907 loops=1)
> > >
> > > You can to rewrite the queries to individual queries to see it if helps.
> > >
> > > In my case, I was doing
> > >
> > > select a.a,b.b,c.c from
> > > (select a from x where) a <--- Put as a SRF
> > > left join (
> > > select b from y where ) b <--- Put as a SRF
> > > on a.a = b.a
> > >
> > >
> > >
> > >
> > > ---------------------------(end of broadcast)---------------------------
> > > TIP 6: explain analyze is your friend
> > >
> >
> >
> > ---------------------------(end of broadcast)---------------------------
> > TIP 7: You can help support the PostgreSQL project by donating at
> >
> >                 http://www.postgresql.org/about/donate
>
>

Re: REPOST: Nested loops row estimates always too high

From
"Steinar H. Gunderson"
Date:
On Tue, Sep 25, 2007 at 12:53:55AM -0400, Carlo Stonebanks wrote:
> My problem is that I think that SRF's are causing my problems. The SRF's
> gets an automatic row estimate of 1000 rows. Add a condition to it, the
> planner guesses 333 rows. Even at 333, this is an overestimate of the number
> of rows returned.
>
> I'm really disappointed - SRF's are a great way to place the enterprise's
> db-centric business logic at the server.

Fortunately, in 8.3 you can attach a row estimate to the function yourself,
which should most likely fix your problem. Look forward to the first beta :-)

/* Steinar */
--
Homepage: http://www.sesse.net/

Re: REPOST: Nested loops row estimates always too high

From
Ow Mun Heng
Date:
On Tue, 2007-09-25 at 11:31 +0200, Steinar H. Gunderson wrote:
> On Tue, Sep 25, 2007 at 12:53:55AM -0400, Carlo Stonebanks wrote:
> > My problem is that I think that SRF's are causing my problems. The SRF's
> > gets an automatic row estimate of 1000 rows. Add a condition to it, the
> > planner guesses 333 rows. Even at 333, this is an overestimate of the number
> > of rows returned.
> >
> > I'm really disappointed - SRF's are a great way to place the enterprise's
> > db-centric business logic at the server.
>
> Fortunately, in 8.3 you can attach a row estimate to the function yourself,
> which should most likely fix your problem. Look forward to the first beta :-)
>

Where can I erad more about this new "feature"?

Re: REPOST: Nested loops row estimates always too high

From
Tom Lane
Date:
Ow Mun Heng <Ow.Mun.Heng@wdc.com> writes:
> Where can I erad more about this new "feature"?

http://developer.postgresql.org/pgdocs/postgres/sql-createfunction.html

http://developer.postgresql.org/pgdocs/postgres/ always has a current
snapshot of CVS-HEAD documentation...

            regards, tom lane

Re: REPOST: Nested loops row estimates always too high

From
Ow Mun Heng
Date:
On Wed, 2007-09-26 at 00:02 -0400, Tom Lane wrote:
> Ow Mun Heng <Ow.Mun.Heng@wdc.com> writes:
> > Where can I erad more about this new "feature"?
>
> http://developer.postgresql.org/pgdocs/postgres/sql-createfunction.html
>
> http://developer.postgresql.org/pgdocs/postgres/ always has a current
> snapshot of CVS-HEAD documentation...


I read these two items

...

execution_cost

    A positive number giving the estimated execution cost for the
function, in units of cpu_operator_cost. If the function returns a set,
this is the cost per returned row. If the cost is not specified, 1 unit
is assumed for C-language and internal functions, and 100 units for
functions in all other languages. Larger values cause the planner to try
to avoid evaluating the function more often than necessary.

result_rows

    A positive number giving the estimated number of rows that the
planner should expect the function to return. This is only allowed when
the function is declared to return a set. The default assumption is 1000
rows.
...
[/snip]