Re: mis-estimate in nested query causes slow runtimes - Mailing list pgsql-performance

From Chris Kratz
Subject Re: mis-estimate in nested query causes slow runtimes
Date
Msg-id 3642025c0802120709q51cce630pf74ff63b32b1b154@mail.gmail.com
Whole thread Raw
In response to Re: mis-estimate in nested query causes slow runtimes  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-performance
On 2/11/08, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Chris Kratz <chris.kratz@vistashare.com> writes:
>   ->  Nested Loop  (cost=42.74..161.76 rows=1 width=38) (actual
> time=2.932..27.772 rows=20153 loops=1)
>         ->  Hash Join  (cost=10.89..22.58 rows=1 width=24) (actual
> time=0.065..0.134 rows=1 loops=1)
>               Hash Cond: (mtchsrcprj3.funding_source_id =
> mtchsrcprjfs3.nameid)
>               ->  Seq Scan on project mtchsrcprj3  (cost=0.00..11.22
> rows=122 width=8) (actual time=0.002..0.054 rows=122 loops=1)
>               ->  Hash  (cost=10.83..10.83 rows=5 width=24) (actual
> time=0.017..0.017 rows=1 loops=1)
>                     ->  Index Scan using name_float_lfm_idx on
> namemaster mtchsrcprjfs3  (cost=0.00..10.83 rows=5 width=24) (actual
> time=0.012..0.013 rows=1 loops=1)
>                           Index Cond: (name_float_lfm = 'DWS'::text)
>         ->  Bitmap Heap Scan on transaction_details idatrndtl
> (cost=31.85..121.60 rows=1407 width=22) (actual time=2.864..12.060
> rows=20153 loops=1)
>               Recheck Cond: (idatrndtl.ida_trans_match_source_id =
> mtchsrcprj3.id)
>               ->  Bitmap Index Scan on
> transaction_details_ida_trans_match_source_id  (cost=0.00..31.50
> rows=1407 width=0) (actual time=2.696..2.696 rows=20153 loops=1)
>                     Index Cond: (idatrndtl.ida_trans_match_source_id =
> mtchsrcprj3.id)

> The first frustration is that I can't get the transaction details scan
> to get any more accurate.  It thinks it will find 1407 records,
> instead it finds 20,153.  Then for whatever reason it thinks that a
> join between 1 record and 1407 records will return 1 record.  This is
> mainly what I can't understand.  Why does it think it will only get
> one record in response when it's a left join?

I don't see any left join there ...

> PG 8.2.4 on Linux kernel 2.6.9 x64

The first thing you should do is update to 8.2.6; we've fixed a fair
number of problems since then that were fallout from the outer-join
planning rewrite in 8.2.

If it still doesn't work very well, please post the pg_stats rows for
the join columns involved (idatrndtl.ida_trans_match_source_id and
mtchsrcprj3.id).  (You do have up-to-date ANALYZE stats for both
of those tables, right?)

                        regards, tom lane


I know it's somewhat premature as we haven't had a chance to do the update yet, but here is what I did w/ the statistics with the current version for chuckles and grins just to see if it would make a difference in the plan.

# alter table project alter column id set statistics 1000;
ALTER TABLE
# analyze project;
ANALYZE
# alter table transaction_details alter column ida_trans_match_source_id set statistics 1000;
ALTER TABLE
# analyze transaction_details;
ANALYZE
# select * from pg_stats where (tablename='project' and attname='id') or (tablename='transaction_details' and attname='ida_trans_match_source_id');
 schemaname |      tablename      |          attname          | null_frac | avg_width | n_distinct |                                                                   most_common_vals                                                                   |                                                                                                                                                                                           most_common_freqs                                                                                                                                                                                            |                                                                                                                                                                                                                              histogram_bounds                                                                                                                                                                                                                              | correlation 
------------+---------------------+---------------------------+-----------+-----------+------------+------------------------------------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-------------
 public     | project             | id                        |         0 |         4 |         -1 |                                                                                                                                                      |                                                                                                                                                                                                                                                                                                                                                                                                        | {6,7,8,12,13,14,15,17,18,19,24,25,26,27,28,29,30,31,32,33,34,35,36,37,41,42,71,72,797,802,803,809,812,813,814,815,816,817,818,822,823,824,825,826,827,828,829,830,831,832,833,834,835,836,837,838,839,840,841,842,843,844,845,846,847,848,849,920,921,922,923,924,925,926,927,928,929,930,931,932,933,934,935,936,937,938,939,940,941,942,946,947,948,949,950,951,952,953,954,955,956,957,958,959,960,961,962,963,964,965,966,967,968,969,970,971,972,973,974,975,977,978} |    0.939317
 public     | transaction_details | ida_trans_match_source_id |  0.480469 |         4 |         74 | {832,818,930,937,923,812,931,829,837,830,836,14,809,838,936,924,921,922,814,816,817,827,815,941,835,967,926,813,968,928,920,939,925,974,833,965,933} | {0.100562,0.100233,0.0412866,0.0245354,0.0223948,0.021277,0.0198998,0.018817,0.0182431,0.0181583,0.0180236,0.0141714,0.0107633,0.00955071,0.00917646,0.00639708,0.00562364,0.00491507,0.00453584,0.0037624,0.00332828,0.00332828,0.00323846,0.00309874,0.00295403,0.00267959,0.00234526,0.00227041,0.00221552,0.00220055,0.00215565,0.00207581,0.00179138,0.00136225,0.00114269,0.00113271,0.00100796} | {6,6,7,8,15,802,802,802,802,803,803,839,841,844,844,845,845,846,927,927,934,934,935,935,938,938,940,942,952,954,955,955,957,972,972,972,978}                                                                                                                                                                                                                                                                                                                               |    0.218267
(2 rows)
 
This had no appreciable difference in the plan.  Here is the part that seems to be causing the problem again after the increase in stats.  It still thinks there is only one row in the result.

 ->  Nested Loop  (cost=42.75..161.78 rows=1 width=38) (actual time=391.797..425.337 rows=20153 loops=1)
       ->  Hash Join  (cost=10.89..22.58 rows=1 width=24) (actual time=0.069..0.139 rows=1 loops=1)
             Hash Cond: (mtchsrcprj3.funding_source_id = mtchsrcprjfs3.nameid)
             ->  Seq Scan on project mtchsrcprj3  (cost=0.00..11.22 rows=122 width=8) (actual time=0.002..0.054 rows=122 loops=1)
             ->  Hash  (cost=10.83..10.83 rows=5 width=24) (actual time=0.022..0.022 rows=1 loops=1)
                   ->  Index Scan using name_float_lfm_idx on namemaster mtchsrcprjfs3  (cost=0.00..10.83 rows=5 width=24) (actual time=0.013..0.014 rows=1 loops=1)
                         Index Cond: (name_float_lfm = 'DWS'::text)
       ->  Bitmap Heap Scan on transaction_details idatrndtl  (cost=31.87..121.61 rows=1407 width=22) (actual time=391.722..410.129 rows=20153 loops=1)
             Recheck Cond: (idatrndtl.ida_trans_match_source_id = mtchsrcprj3.id)
             ->  Bitmap Index Scan on transaction_details_ida_trans_match_source_id  (cost=0.00..31.51 rows=1407 width=0) (actual time=391.523..391.523 rows=20153 loops=1)
                   Index Cond: (idatrndtl.ida_trans_match_source_id = mtchsrcprj3.id)

Here is the relevant snippet from the query

<-- snip -->
FROM 
  accounts acc 
  left join transactions idatrn on (acc.id = idatrn.account_id) 
left join transaction_details idatrndtl on (idatrn.id = idatrndtl.transaction_id) 
left join project mtchsrcprj3 on (idatrndtl.ida_trans_match_source_id = mtchsrcprj3.id
left join namemaster mtchsrcprjfs3 on ( mtchsrcprj3.funding_source_id = mtchsrcprjfs3.nameid) 
<-- snip -->

I'll update again once we've had a chance to do the update.

-Chris

pgsql-performance by date:

Previous
From: "Chris Kratz"
Date:
Subject: Re: mis-estimate in nested query causes slow runtimes
Next
From: Tom Lane
Date:
Subject: Re: Update with Subquery Performance