Thread: Nested loops overpriced

Nested loops overpriced

From
Peter Eisentraut
Date:
This query does some sort of analysis on an email archive:

        SELECT
                eh_subj.header_body AS subject,
                count(distinct eh_from.header_body)
        FROM
                email JOIN mime_part USING (email_id)
                JOIN email_header eh_subj USING (email_id, mime_part_id)
                JOIN email_header eh_from USING (email_id, mime_part_id)
        WHERE
                eh_subj.header_name = 'subject'
                AND eh_from.header_name = 'from'
                AND mime_part_id = 0
                AND (time >= timestamp '2007-05-05 17:01:59' AND time < timestamp '2007-05-05 17:01:59' + interval '60
min')
        GROUP BY
                eh_subj.header_body;

The planner chooses this plan:

                                                                                              QUERY PLAN
                                                                               

-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 GroupAggregate  (cost=87142.18..87366.58 rows=11220 width=184) (actual time=7883.541..8120.647 rows=35000 loops=1)
   ->  Sort  (cost=87142.18..87170.23 rows=11220 width=184) (actual time=7883.471..7926.031 rows=35000 loops=1)
         Sort Key: eh_subj.header_body
         ->  Hash Join  (cost=46283.30..86387.42 rows=11220 width=184) (actual time=5140.182..7635.615 rows=35000
loops=1)
               Hash Cond: (eh_subj.email_id = email.email_id)
               ->  Bitmap Heap Scan on email_header eh_subj  (cost=11853.68..50142.87 rows=272434 width=104) (actual
time=367.956..1719.736rows=280989 loops=1) 
                     Recheck Cond: ((mime_part_id = 0) AND (header_name = 'subject'::text))
                     ->  BitmapAnd  (cost=11853.68..11853.68 rows=27607 width=0) (actual time=326.507..326.507 rows=0
loops=1)
                           ->  Bitmap Index Scan on idx__email_header__header_body_subject  (cost=0.00..5836.24
rows=272434width=0) (actual time=178.041..178.041 rows=280989 loops=1) 
                           ->  Bitmap Index Scan on idx__email_header__header_name  (cost=0.00..5880.97 rows=281247
width=0)(actual time=114.574..114.574 rows=280989 loops=1) 
                                 Index Cond: (header_name = 'subject'::text)
               ->  Hash  (cost=34291.87..34291.87 rows=11020 width=120) (actual time=4772.148..4772.148 rows=35000
loops=1)
                     ->  Hash Join  (cost=24164.59..34291.87 rows=11020 width=120) (actual time=3131.067..4706.997
rows=35000loops=1) 
                           Hash Cond: (mime_part.email_id = email.email_id)
                           ->  Seq Scan on mime_part  (cost=0.00..8355.81 rows=265804 width=12) (actual
time=0.038..514.291rows=267890 loops=1) 
                                 Filter: (mime_part_id = 0)
                           ->  Hash  (cost=24025.94..24025.94 rows=11092 width=112) (actual time=3130.982..3130.982
rows=35000loops=1) 
                                 ->  Hash Join  (cost=22244.54..24025.94 rows=11092 width=112) (actual
time=996.556..3069.280rows=35000 loops=1) 
                                       Hash Cond: (eh_from.email_id = email.email_id)
                                       ->  Bitmap Heap Scan on email_header eh_from  (cost=15576.58..16041.55
rows=107156width=104) (actual time=569.762..1932.017 rows=280990 loops=1) 
                                             Recheck Cond: ((mime_part_id = 0) AND (header_name = 'from'::text))
                                             ->  BitmapAnd  (cost=15576.58..15576.58 rows=160 width=0) (actual
time=532.217..532.217rows=0 loops=1) 
                                                   ->  Bitmap Index Scan on dummy_index  (cost=0.00..3724.22
rows=107156width=0) (actual time=116.386..116.386 rows=280990 loops=1) 
                                                   ->  Bitmap Index Scan on idx__email_header__from_local
(cost=0.00..5779.24rows=107156 width=0) (actual time=174.883..174.883 rows=280990 loops=1) 
                                                   ->  Bitmap Index Scan on dummy2_index  (cost=0.00..5992.25
rows=107156width=0) (actual time=173.575..173.575 rows=280990 loops=1) 
                                       ->  Hash  (cost=6321.79..6321.79 rows=27694 width=8) (actual
time=426.739..426.739rows=35000 loops=1) 
                                             ->  Index Scan using idx__email__time on email  (cost=0.00..6321.79
rows=27694width=8) (actual time=50.000..375.021 rows=35000 loops=1) 
                                                   Index Cond: (("time" >= '2007-05-05 17:01:59'::timestamp without
timezone) AND ("time" < '2007-05-05 18:01:59'::timestamp without time zone)) 
 Total runtime: 8160.442 ms

The estimates all look pretty good and reasonable.

A faster plan, however, is this:

                                                                                        QUERY PLAN
                                                                   

-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 GroupAggregate  (cost=1920309.81..1920534.21 rows=11220 width=184) (actual time=5349.493..5587.536 rows=35000 loops=1)
   ->  Sort  (cost=1920309.81..1920337.86 rows=11220 width=184) (actual time=5349.427..5392.110 rows=35000 loops=1)
         Sort Key: eh_subj.header_body
         ->  Nested Loop  (cost=15576.58..1919555.05 rows=11220 width=184) (actual time=537.938..5094.377 rows=35000
loops=1)
               ->  Nested Loop  (cost=15576.58..475387.23 rows=11020 width=120) (actual time=537.858..4404.330
rows=35000loops=1) 
                     ->  Nested Loop  (cost=15576.58..430265.44 rows=11092 width=112) (actual time=537.768..4024.184
rows=35000loops=1) 
                           ->  Bitmap Heap Scan on email_header eh_from  (cost=15576.58..16041.55 rows=107156
width=104)(actual time=537.621..1801.032 rows=280990 loops=1) 
                                 Recheck Cond: ((mime_part_id = 0) AND (header_name = 'from'::text))
                                 ->  BitmapAnd  (cost=15576.58..15576.58 rows=160 width=0) (actual
time=500.006..500.006rows=0 loops=1) 
                                       ->  Bitmap Index Scan on dummy_index  (cost=0.00..3724.22 rows=107156 width=0)
(actualtime=85.025..85.025 rows=280990 loops=1) 
                                       ->  Bitmap Index Scan on idx__email_header__from_local  (cost=0.00..5779.24
rows=107156width=0) (actual time=173.006..173.006 rows=280990 loops=1) 
                                       ->  Bitmap Index Scan on dummy2_index  (cost=0.00..5992.25 rows=107156 width=0)
(actualtime=174.463..174.463 rows=280990 loops=1) 
                           ->  Index Scan using email_pkey on email  (cost=0.00..3.85 rows=1 width=8) (actual
time=0.005..0.005rows=0 loops=280990) 
                                 Index Cond: (email.email_id = eh_from.email_id)
                                 Filter: (("time" >= '2007-05-05 17:01:59'::timestamp without time zone) AND ("time" <
'2007-05-0518:01:59'::timestamp without time zone)) 
                     ->  Index Scan using mime_part_pkey on mime_part  (cost=0.00..4.06 rows=1 width=12) (actual
time=0.005..0.006rows=1 loops=35000) 
                           Index Cond: ((email.email_id = mime_part.email_id) AND (mime_part.mime_part_id = 0))
               ->  Index Scan using idx__email_header__email_id__mime_part_id on email_header eh_subj
(cost=0.00..130.89rows=13 width=104) (actual time=0.009..0.015 rows=1 loops=35000) 
                     Index Cond: ((email.email_id = eh_subj.email_id) AND (0 = eh_subj.mime_part_id))
                     Filter: (header_name = 'subject'::text)
 Total runtime: 5625.024 ms

Note how spectacularly overpriced this plan is.  The costs for the nested
loops are calculated approximately as number of outer tuples times cost of
the inner scan.  So slight overestimations of the inner scans such as

Index Scan using email_pkey on email  (cost=0.00..3.85 rows=1 width=8) (actual time=0.005..0.005 rows=0 loops=280990)

kill this calculation.

Most likely, all of these database is cached, so I tried reducing
seq_page_cost and random_page_cost, but I needed to turn them all the way
down to 0.02 or 0.03, which is almost like cpu_tuple_cost.  Is that
reasonable?  Or what is wrong here?


PostgreSQL 8.2.1 on x86_64-unknown-linux-gnu
work_mem = 256MB
effective_cache_size = 384MB

The machine has 1GB of RAM.

--
Peter Eisentraut
http://developer.postgresql.org/~petere/

Re: Nested loops overpriced

From
Tom Lane
Date:
Peter Eisentraut <peter_e@gmx.net> writes:
> Note how spectacularly overpriced this plan is.

Hmm, I'd have expected it to discount the repeated indexscans a lot more
than it seems to be doing for you.  As an example in the regression
database, note what happens to the inner indexscan cost estimate when
the number of outer tuples grows:

regression=# set enable_hashjoin TO 0;
SET
regression=# set enable_mergejoin TO 0;
SET
regression=# set enable_bitmapscan TO 0;
SET
regression=# explain select * from tenk1 a join tenk1 b using (thousand) where a.unique1 = 1;
                                          QUERY PLAN
----------------------------------------------------------------------------------------------
 Nested Loop  (cost=0.00..52.82 rows=10 width=484)
   ->  Index Scan using tenk1_unique1 on tenk1 a  (cost=0.00..8.27 rows=1 width=244)
         Index Cond: (unique1 = 1)
   ->  Index Scan using tenk1_thous_tenthous on tenk1 b  (cost=0.00..44.42 rows=10 width=244)
         Index Cond: (a.thousand = b.thousand)
(5 rows)

regression=# explain select * from tenk1 a join tenk1 b using (thousand) where a.ten = 1;
                                         QUERY PLAN
---------------------------------------------------------------------------------------------
 Nested Loop  (cost=0.00..2531.08 rows=9171 width=484)
   ->  Seq Scan on tenk1 a  (cost=0.00..483.00 rows=900 width=244)
         Filter: (ten = 1)
   ->  Index Scan using tenk1_thous_tenthous on tenk1 b  (cost=0.00..2.15 rows=10 width=244)
         Index Cond: (a.thousand = b.thousand)
(5 rows)

This is with 8.2.4 but AFAICS from the CVS logs, 8.2's cost estimation
code didn't change since 8.2.1.  What do you get for a comparably
simple case?

            regards, tom lane

Re: Nested loops overpriced

From
Peter Eisentraut
Date:
Am Dienstag, 8. Mai 2007 17:53 schrieb Tom Lane:
> Hmm, I'd have expected it to discount the repeated indexscans a lot more
> than it seems to be doing for you.  As an example in the regression
> database, note what happens to the inner indexscan cost estimate when
> the number of outer tuples grows:

I can reproduce your results in the regression test database. 8.2.1 and 8.2.4
behave the same.

I checked the code around cost_index(), and the assumptions appear to be
correct (at least this query doesn't produce wildly unusual data).
Apparently, however, the caching effects are much more significant than the
model takes into account.

--
Peter Eisentraut
http://developer.postgresql.org/~petere/

Re: Nested loops overpriced

From
"Daniel Cristian Cruz"
Date:
I'm having something weird too...

Look:

 Nested Loop Left Join  (cost=93.38..7276.26 rows=93 width=58) (actual
time=99.211..4804.525 rows=2108 loops=1)
   ->  Hash Join  (cost=93.38..3748.18 rows=93 width=4) (actual
time=0.686..20.632 rows=45 loops=1)
         Hash Cond: ((u.i)::text = (m.i)::text)
         ->  Seq Scan on u  (cost=0.00..2838.80 rows=10289 width=4)
(actual time=0.010..7.813 rows=10291 loops=1)
         ->  Hash  (cost=87.30..87.30 rows=30 width=7) (actual
time=0.445..0.445 rows=45 loops=1)
               ->  Index Scan using m_pkey on m  (cost=0.00..87.30
rows=30 width=7) (actual time=0.046..0.371 rows=45 loops=1)
                     Index Cond: (t = 1613)
                     Filter: ((a)::text = 'Y'::text)
   ->  Index Scan using s_pkey on s  (cost=0.00..37.33 rows=3
width=58) (actual time=19.864..106.198 rows=47 loops=45)
         Index Cond: ((u.i = s.u) AND ((s.p)::text = '4'::text) AND
(s.t = 1613) AND ((s.c)::text = 'cmi.core.total_time'::text))
 Total runtime: 4805.975 ms

And disabling all the joins Tom said:

 Nested Loop Left Join  (cost=0.00..16117.12 rows=93 width=58) (actual
time=2.706..168.556 rows=2799 loops=1)
   ->  Nested Loop  (cost=0.00..13187.94 rows=93 width=4) (actual
time=2.622..125.739 rows=50 loops=1)
         ->  Seq Scan on u  (cost=0.00..2838.80 rows=10289 width=4)
(actual time=0.012..9.863 rows=10291 loops=1)
         ->  Index Scan using m_pkey on m  (cost=0.00..0.80 rows=1
width=7) (actual time=0.009..0.009 rows=0 loops=10291)
               Index Cond: ((m.t = 1615) AND ((u.i)::text = (m.i)::text))
               Filter: ((a)::text = 'Y'::text)
   ->  Index Scan using s_pkey on s  (cost=0.00..31.09 rows=2
width=58) (actual time=0.047..0.778 rows=56 loops=50)
         Index Cond: ((u.i = s.u) AND ((s.p)::text = '4'::text) AND
(s.t = 1615) AND ((s.c)::text = 'cmi.core.total_time'::text))
 Total runtime: 169.836 ms

I had PostgreSQL 8.2.3 on x86_64-redhat-linux-gnu, shared_buffers with
1640MB, effective_cache_size with 5400MB and 8GB of RAM, where all
shared_buffers blocks are used (pg_buffercache, relfilenode IS NOT
NULL).

Note that even when I set default_statistics_target to 500, and
calling "ANALYZE s;", I cannot see the number of estimated rows on the
index scan on s close to the actual rows.

Could it be related?

2007/5/9, Peter Eisentraut <peter_e@gmx.net>:
> Am Dienstag, 8. Mai 2007 17:53 schrieb Tom Lane:
> > Hmm, I'd have expected it to discount the repeated indexscans a lot more
> > than it seems to be doing for you.  As an example in the regression
> > database, note what happens to the inner indexscan cost estimate when
> > the number of outer tuples grows:
>
> I can reproduce your results in the regression test database. 8.2.1 and 8.2.4
> behave the same.
>
> I checked the code around cost_index(), and the assumptions appear to be
> correct (at least this query doesn't produce wildly unusual data).
> Apparently, however, the caching effects are much more significant than the
> model takes into account.
>
> --
> Peter Eisentraut
> http://developer.postgresql.org/~petere/
>
> ---------------------------(end of broadcast)---------------------------
> TIP 7: You can help support the PostgreSQL project by donating at
>
>                 http://www.postgresql.org/about/donate
>



--
Daniel Cristian Cruz

Re: Nested loops overpriced

From
Tom Lane
Date:
Peter Eisentraut <peter_e@gmx.net> writes:
> Am Dienstag, 8. Mai 2007 17:53 schrieb Tom Lane:
>> Hmm, I'd have expected it to discount the repeated indexscans a lot more
>> than it seems to be doing for you.  As an example in the regression
>> database, note what happens to the inner indexscan cost estimate when
>> the number of outer tuples grows:

> I can reproduce your results in the regression test database. 8.2.1 and 8.2.4
> behave the same.

Well, there's something funny going on here.  You've got for instance

           ->  Index Scan using email_pkey on email  (cost=0.00..3.85 rows=1 width=8) (actual time=0.005..0.005 rows=0
loops=280990)
                 Index Cond: (email.email_id = eh_from.email_id)
                 Filter: (("time" >= '2007-05-05 17:01:59'::timestamp without time zone) AND ("time" < '2007-05-05
18:01:59'::timestampwithout time zone)) 

on the inside of a nestloop whose outer side is predicted to return
107156 rows.  That should've been discounted to *way* less than 3.85
cost units per iteration.

Are you using any nondefault planner settings?  How big are these
tables, anyway?

            regards, tom lane

Re: Nested loops overpriced

From
Gregory Stark
Date:
"Daniel Cristian Cruz" <danielcristian@gmail.com> writes:

>   ->  Nested Loop  (cost=0.00..13187.94 rows=93 width=4) (actual time=2.622..125.739 rows=50 loops=1)
>         ->  Seq Scan on u  (cost=0.00..2838.80 rows=10289 width=4) (actual time=0.012..9.863 rows=10291 loops=1)
>         ->  Index Scan using m_pkey on m  (cost=0.00..0.80 rows=1 width=7) (actual time=0.009..0.009 rows=0
loops=10291)

That's not discounting the nested loop for cache effect at all!

What is your effective_cache_size for this?

--
  Gregory Stark
  EnterpriseDB          http://www.enterprisedb.com


Re: Nested loops overpriced

From
"Daniel Cristian Cruz"
Date:
2007/5/9, Gregory Stark <stark@enterprisedb.com>:
>
> "Daniel Cristian Cruz" <danielcristian@gmail.com> writes:
>
> >   ->  Nested Loop  (cost=0.00..13187.94 rows=93 width=4) (actual time=2.622..125.739 rows=50 loops=1)
> >         ->  Seq Scan on u  (cost=0.00..2838.80 rows=10289 width=4) (actual time=0.012..9.863 rows=10291 loops=1)
> >         ->  Index Scan using m_pkey on m  (cost=0.00..0.80 rows=1 width=7) (actual time=0.009..0.009 rows=0
loops=10291)
>
> That's not discounting the nested loop for cache effect at all!
>
> What is your effective_cache_size for this?

effective_cache_size is 5400MB.

I forgot to mention a modifications on cost:
cpu_tuple_cost = 0.2
Which forced a usage of indexes.

I set it to 0.01 and the plan has a index scan on m before the hash on
u, being 15% slower:

         Hash Cond: ((u.i)::text = (m.i)::text)
         ->  Seq Scan on u  (cost=0.00..2838.80 rows=10289 width=4)
(actual time=0.007..6.138 rows=10292 loops=1)
         ->  Hash  (cost=87.30..87.30 rows=30 width=7) (actual
time=0.185..0.185 rows=50 loops=1)
               ->  Index Scan using m_pkey on m  (cost=0.00..87.30
rows=30 width=7) (actual time=0.021..0.144 rows=50 loops=1)
                     Index Cond: (t = 1615)
                     Filter: ((a)::text = 'Y'::text)

I'm still confused since I didn't understood what "That's not
discounting the nested loop for cache effect at all!" could mean...

Thanks for the help.
--
Daniel Cristian Cruz

Re: Nested loops overpriced

From
Peter Eisentraut
Date:
Am Mittwoch, 9. Mai 2007 16:11 schrieb Tom Lane:
> Well, there's something funny going on here.  You've got for instance
>
>            ->  Index Scan using email_pkey on email  (cost=0.00..3.85
> rows=1 width=8) (actual time=0.005..0.005 rows=0 loops=280990) Index Cond:
> (email.email_id = eh_from.email_id)
>                  Filter: (("time" >= '2007-05-05 17:01:59'::timestamp
> without time zone) AND ("time" < '2007-05-05 18:01:59'::timestamp without
> time zone))
>
> on the inside of a nestloop whose outer side is predicted to return
> 107156 rows.  That should've been discounted to *way* less than 3.85
> cost units per iteration.

This is the new plan with 8.2.4.  It's still got the same problem, though.

                                                                                        QUERY PLAN
                                                                   

-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 GroupAggregate  (cost=5627064.21..5627718.73 rows=32726 width=184) (actual time=4904.834..5124.585 rows=35000 loops=1)
   ->  Sort  (cost=5627064.21..5627146.03 rows=32726 width=184) (actual time=4904.771..4947.892 rows=35000 loops=1)
         Sort Key: eh_subj.header_body
         ->  Nested Loop  (cost=0.00..5624610.06 rows=32726 width=184) (actual time=0.397..4628.141 rows=35000 loops=1)
               ->  Nested Loop  (cost=0.00..1193387.12 rows=28461 width=120) (actual time=0.322..3960.360 rows=35000
loops=1)
                     ->  Nested Loop  (cost=0.00..1081957.26 rows=28648 width=112) (actual time=0.238..3572.023
rows=35000loops=1) 
                           ->  Index Scan using dummy_index on email_header eh_from  (cost=0.00..13389.15 rows=280662
width=104)(actual time=0.133..1310.248 rows=280990 loops=1) 
                           ->  Index Scan using email_pkey on email  (cost=0.00..3.79 rows=1 width=8) (actual
time=0.005..0.005rows=0 loops=280990) 
                                 Index Cond: (email.email_id = eh_from.email_id)
                                 Filter: (("time" >= '2007-05-05 17:01:59'::timestamp without time zone) AND ("time" <
'2007-05-0518:01:59'::timestamp without time zone)) 
                     ->  Index Scan using mime_part_pkey on mime_part  (cost=0.00..3.88 rows=1 width=12) (actual
time=0.005..0.006rows=1 loops=35000) 
                           Index Cond: ((email.email_id = mime_part.email_id) AND (mime_part.mime_part_id = 0))
               ->  Index Scan using idx__email_header__email_id__mime_part_id on email_header eh_subj
(cost=0.00..155.47rows=18 width=104) (actual time=0.009..0.014 rows=1 loops=35000) 
                     Index Cond: ((email.email_id = eh_subj.email_id) AND (0 = eh_subj.mime_part_id))
                     Filter: (header_name = 'subject'::text)
 Total runtime: 5161.390 ms

> Are you using any nondefault planner settings?

random_page_cost = 3
effective_cache_size = 384MB

> How big are these tables, anyway?

email        35 MB
email_header    421 MB
mime_part    37 MB

Everything is analyzed, vacuumed, and reindexed.

--
Peter Eisentraut
http://developer.postgresql.org/~petere/

Re: Nested loops overpriced

From
Tom Lane
Date:
Peter Eisentraut <peter_e@gmx.net> writes:
>> Are you using any nondefault planner settings?

> random_page_cost = 3
> effective_cache_size = 384MB

>> How big are these tables, anyway?

> email        35 MB
> email_header    421 MB
> mime_part    37 MB

Hmmm ... I see at least part of the problem, which is that email_header
is joined twice in this query, which means that it's counted twice in
figuring the total volume of pages competing for cache space.  So the
thing thinks cache space is oversubscribed nearly 3X when in reality
the database is fully cached.  I remember having dithered about whether
to try to avoid counting the same physical relation more than once in
total_table_pages, but this example certainly suggests that we
shouldn't.  Meanwhile, do the estimates get better if you set
effective_cache_size to 1GB or so?

To return to your original comment: if you're trying to model a
situation with a fully cached database, I think it's sensible
to set random_page_cost = seq_page_cost = 0.1 or so.  You had
mentioned having to decrease them to 0.02, which seems unreasonably
small to me too, but maybe with the larger effective_cache_size
you won't have to go that far.

            regards, tom lane

Re: Nested loops overpriced

From
Peter Eisentraut
Date:
Am Mittwoch, 9. Mai 2007 19:40 schrieb Tom Lane:
> I remember having dithered about whether
> to try to avoid counting the same physical relation more than once in
> total_table_pages, but this example certainly suggests that we
> shouldn't.  Meanwhile, do the estimates get better if you set
> effective_cache_size to 1GB or so?

Yes, that makes the plan significantly cheaper (something like 500,000 instead
of 5,000,000), but still a lot more expensive than the hash join (about
100,000).

> To return to your original comment: if you're trying to model a
> situation with a fully cached database, I think it's sensible
> to set random_page_cost = seq_page_cost = 0.1 or so.  You had
> mentioned having to decrease them to 0.02, which seems unreasonably
> small to me too, but maybe with the larger effective_cache_size
> you won't have to go that far.

Heh, when I decrease these parameters, the hash join gets cheaper as well.  I
can't actually get it to pick the nested-loop join.

--
Peter Eisentraut
http://developer.postgresql.org/~petere/

Re: Nested loops overpriced

From
Peter Eisentraut
Date:
Am Mittwoch, 9. Mai 2007 19:40 schrieb Tom Lane:
> Hmmm ... I see at least part of the problem, which is that email_header
> is joined twice in this query, which means that it's counted twice in
> figuring the total volume of pages competing for cache space.  So the
> thing thinks cache space is oversubscribed nearly 3X when in reality
> the database is fully cached.

I should add that other, similar queries in this database that do not involve
joining the same table twice produce seemingly optimal plans.  (It picks hash
joins which are actually faster than nested loops.)

--
Peter Eisentraut
http://developer.postgresql.org/~petere/

Re: Nested loops overpriced

From
Tom Lane
Date:
Peter Eisentraut <peter_e@gmx.net> writes:
> Am Mittwoch, 9. Mai 2007 19:40 schrieb Tom Lane:
>> Hmmm ... I see at least part of the problem, which is that email_header
>> is joined twice in this query, which means that it's counted twice in
>> figuring the total volume of pages competing for cache space. So the
>> thing thinks cache space is oversubscribed nearly 3X when in reality
>> the database is fully cached.

> I should add that other, similar queries in this database that do not
> involve joining the same table twice produce seemingly optimal plans.
> (It picks hash joins which are actually faster than nested loops.)

It strikes me that in a situation like this, where the same table is
being scanned twice by concurrent indexscans, we ought to amortize the
fetches across *both* scans rather than treating them independently;
so there are actually two different ways in which we're being too
pessimistic about the indexscanning cost.

Difficult to see how to fix that in the current planner design however;
since it's a bottom-up process, we have to cost the individual scans
without any knowledge of what approach will be chosen for other scans.

            regards, tom lane