Thread: 8.4 optimization regression?

8.4 optimization regression?

From
Mark Kirkwood
Date:
I am in the progress of an 8.3 to 8.4 upgrade for a customer. I seem to
have stumbled upon what looks like a regression. The two databases
(8.3.14 and 8.4.8) have identical tuning parameters (where that makes
sense) and run on identical hardware. Both databases are regularly
vacuumed and analyzed (not by autovacuum), and performing an ANALYZE
does not change the plans shown below.

The query below is a slightly simplified version of what a report uses.
I note that 8.3 chooses a better plan (approx twice as fast) and 8.4 is
massively *overestimating* the rows from the top hash join (8.3
massively underestimates them mind you). This row overestimating becomes
more of an issue when the remaining subqueries etc are added into the
query below - to the point where the 8.4 runtime goes into days unless.
Now I have some ways around that (convert NOT IN to NOT EXISTS), but
there seems to be nothing I can do to speed this base query, which
unfortunately is a common construction in the application. Any ideas?

Query and plans:

          EXPLAIN ANALYZE
          SELECT 1
          FROM correspondence c
          JOIN audit_log gen_al on (gen_al.audit_id = generated_audit_id)
          JOIN correspondence_master cm using(corresp_master_id)
          JOIN person p on(p.person_id = cm.person_id)

          WHERE c.corresp_type_id IN ('CL11', 'CL11A', 'CL12', 'CL15',
'CL15A', 'CL16', 'DM_1', 'DM_2')
              AND cm.person_id IS NOT NULL
              AND gen_al.audit_timestamp > ('2011-08-19
13:05'::timestamp - '6 months'::interval)
              AND p.active = true
              AND p.exclude_walklist_alt = false
              AND p.postal_address_id is null
              AND p.unpublished = false
              AND p.enrolment_status_id in ('E', 'T')
              AND p.person_type in ('M', 'D', 'O')

QUERY PLAN 8.3

-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
  Hash Join  (cost=1710864.04..1835979.46 rows=33873 width=0) (actual
time=34520.372..39762.042 rows=220336 loops=1)
    Hash Cond: (p.person_id = cm.person_id)
    ->  Seq Scan on person p  (cost=0.00..115154.11 rows=2566020
width=4) (actual time=0.066..3347.560 rows=3101074 loops=1)
          Filter: (active AND (NOT exclude_walklist_alt) AND
(postal_address_id IS NULL) AND (NOT unpublished) AND
(enrolment_status_id = ANY ('{E,T}'::text[])) AND (person_type = ANY
('{M,D,O}'::text[])))
    ->  Hash  (cost=1703761.54..1703761.54 rows=568200 width=4) (actual
time=34519.041..34519.041 rows=251911 loops=1)
          ->  Hash Join  (cost=793298.59..1703761.54 rows=568200
width=4) (actual time=8383.900..34414.612 rows=251911 loops=1)
                Hash Cond: (cm.corresp_master_id = c.corresp_master_id)
                ->  Seq Scan on correspondence_master cm
(cost=0.00..778644.91 rows=33636281 width=12) (actual
time=0.045..9651.799 rows=33966957 loops=1)
                      Filter: (person_id IS NOT NULL)
                ->  Hash  (cost=785006.92..785006.92 rows=663333
width=8) (actual time=8260.951..8260.951 rows=358582 loops=1)
                      ->  Hash Join  (cost=233226.31..785006.92
rows=663333 width=8) (actual time=7042.396..8140.430 rows=358582 loops=1)
                            Hash Cond: (c.generated_audit_id =
gen_al.audit_id)
                            ->  Bitmap Heap Scan on correspondence c
(cost=74599.32..527309.84 rows=4103876 width=16) (actual
time=869.067..2474.081 rows=5297729 loops=1)
                                  Recheck Cond: (corresp_type_id = ANY
('{CL11,CL11A,CL12,CL15,CL15A,CL16,DM_1,DM_2}'::text[]))
                                  ->  Bitmap Index Scan on
corresp_type_fk  (cost=0.00..73573.35 rows=4103876 width=0) (actual
time=834.201..834.201 rows=5297729 loops=1)
                                        Index Cond: (corresp_type_id =
ANY ('{CL11,CL11A,CL12,CL15,CL15A,CL16,DM_1,DM_2}'::text[]))
                            ->  Hash  (cost=111084.39..111084.39
rows=2897808 width=8) (actual time=3373.619..3373.619 rows=2854079 loops=1)
                                  ->  Index Scan using
audit_log_audit_timestamp on audit_log gen_al  (cost=0.00..111084.39
rows=2897808 width=8) (actual time=0.164..2322.894 rows=2854079 loops=1)
                                        Index Cond: (audit_timestamp >
'2011-02-19 13:05:00'::timestamp without time zone)
  Total runtime: 39787.258 ms



QUERY PLAN 8.4

-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
  Hash Join  (cost=914274.32..2577396.93 rows=3757429 width=0) (actual
time=11425.505..62959.205 rows=220336 loops=1)
    Hash Cond: (c.generated_audit_id = gen_al.audit_id)
    ->  Hash Join  (cost=761247.58..2313627.39 rows=3757429 width=8)
(actual time=8335.459..58598.883 rows=3087353 loops=1)
          Hash Cond: (cm.person_id = p.person_id)
          ->  Hash Join  (cost=604036.44..2031002.71 rows=4202314
width=12) (actual time=3850.257..49067.879 rows=3586540 loops=1)
                Hash Cond: (cm.corresp_master_id = c.corresp_master_id)
                ->  Seq Scan on correspondence_master cm
(cost=0.00..777996.60 rows=33964871 width=12) (actual
time=0.024..10206.688 rows=33966957 loops=1)
                      Filter: (person_id IS NOT NULL)
                ->  Hash  (cost=530987.51..530987.51 rows=4202314
width=16) (actual time=3848.577..3848.577 rows=5297729 loops=1)
                      ->  Bitmap Heap Scan on correspondence c
(cost=76346.23..530987.51 rows=4202314 width=16) (actual
time=673.660..2272.497 rows=5297729 loops=1)
                            Recheck Cond: (corresp_type_id = ANY
('{CL11,CL11A,CL12,CL15,CL15A,CL16,DM_1,DM_2}'::text[]))
                            ->  Bitmap Index Scan on corresp_type_fk
(cost=0.00..75295.65 rows=4202314 width=0) (actual time=640.301..640.301
rows=5297729 loops=1)
                                  Index Cond: (corresp_type_id = ANY
('{CL11,CL11A,CL12,CL15,CL15A,CL16,DM_1,DM_2}'::text[]))
          ->  Hash  (cost=115091.03..115091.03 rows=2567289 width=4)
(actual time=4484.737..4484.737 rows=3101076 loops=1)
                ->  Seq Scan on person p  (cost=0.00..115091.03
rows=2567289 width=4) (actual time=0.013..3406.661 rows=3101076 loops=1)
                      Filter: (active AND (NOT exclude_walklist_alt) AND
(postal_address_id IS NULL) AND (NOT unpublished) AND
(enrolment_status_id = ANY ('{E,T}'::text[])) AND (person_type = ANY
('{M,D,O}'::text[])))
    ->  Hash  (cost=107100.99..107100.99 rows=2799260 width=8) (actual
time=2962.039..2962.039 rows=2854070 loops=1)
          ->  Index Scan using audit_log_audit_timestamp on audit_log
gen_al  (cost=0.00..107100.99 rows=2799260 width=8) (actual
time=0.181..2055.802 rows=2854070 loops=1)
                Index Cond: (audit_timestamp > '2011-02-19
13:05:00'::timestamp without time zone)
  Total runtime: 63007.358 ms


Cheers

Mark

Re: 8.4 optimization regression?

From
Tom Lane
Date:
Mark Kirkwood <mark.kirkwood@catalyst.net.nz> writes:
> I am in the progress of an 8.3 to 8.4 upgrade for a customer. I seem to
> have stumbled upon what looks like a regression. The two databases
> (8.3.14 and 8.4.8) have identical tuning parameters (where that makes
> sense) and run on identical hardware. Both databases are regularly
> vacuumed and analyzed (not by autovacuum), and performing an ANALYZE
> does not change the plans shown below.

Hmmm ... this is structurally a pretty simple query, so I'm surprised
that 8.3 and 8.4 see it very much differently.  The relation-level
estimates and plan choices are very nearly the same; the only thing
that's changed much is the estimates of the join sizes, and there were
not that many changes in the join selectivity estimation for simple
inner joins.  I wonder whether you are seeing a bad side-effect of this
patch:

http://git.postgresql.org/gitweb/?p=postgresql.git&a=commitdiff&h=7f3eba30

That code would only be reached when one or both join columns lack MCV
lists in pg_stats; if you had analyzed, the only reason for that to be
the case is if the column is unique (or nearly so, in ANALYZE's opinion).

            regards, tom lane

Re: 8.4 optimization regression?

From
Mark Kirkwood
Date:
On 24/08/11 15:15, Tom Lane wrote:
> Mark Kirkwood<mark.kirkwood@catalyst.net.nz>  writes:
>> I am in the progress of an 8.3 to 8.4 upgrade for a customer. I seem to
>> have stumbled upon what looks like a regression. The two databases
>> (8.3.14 and 8.4.8) have identical tuning parameters (where that makes
>> sense) and run on identical hardware. Both databases are regularly
>> vacuumed and analyzed (not by autovacuum), and performing an ANALYZE
>> does not change the plans shown below.
> Hmmm ... this is structurally a pretty simple query, so I'm surprised
> that 8.3 and 8.4 see it very much differently.  The relation-level
> estimates and plan choices are very nearly the same; the only thing
> that's changed much is the estimates of the join sizes, and there were
> not that many changes in the join selectivity estimation for simple
> inner joins.  I wonder whether you are seeing a bad side-effect of this
> patch:
>
> http://git.postgresql.org/gitweb/?p=postgresql.git&a=commitdiff&h=7f3eba30
>
> That code would only be reached when one or both join columns lack MCV
> lists in pg_stats; if you had analyzed, the only reason for that to be
> the case is if the column is unique (or nearly so, in ANALYZE's opinion).
>

Right that will be the case -  audit_id is primary key for audit_log.
Stats entries for the join columns look like:

=# SELECT tablename
,attname,n_distinct,most_common_vals,most_common_freqs,histogram_bounds
FROM pg_stats WHERE tablename IN ('correspondence','audit_log') AND
attname IN ('audit_id','generated_audit_id');
-[ RECORD 1

]-----+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
tablename         | correspondence
attname           | generated_audit_id
n_distinct        | 4625
most_common_vals  |

{11983812,15865407,4865496,717803,842478,725709,7255002,2389608,4604147,9996442,8693810,4604145,5916872,2389606,3135764,3307895,10527855,7254994,8959356,9595632,6279892,9595640,2604937,5916870,6279950,1180586,2604768,1180638,11526036,4451499,5252795,6279919,6279955,8958886,2604929,6279904,7543722,8959031,2604804,7543823,8958930,8959226,1180650,2604871,3530205,6279960,11051216,11051224,3530140,7838365,15060203,1180309,1180423,3530177,7543749,7543790,8959026,8959083,12834024,1180447,1180632,1180664,2604779,2604901,2604943,6279944,6280027,7543820,8958992,8959011,3530107,6279923,7543085,15866296,1180470,1180473,2604846,2604874,2604892,6279977,6280046,7543496,8958904,8958914,1180281,1180497,2604801,2604973,3529965,6280051,7543654,7543667,7543815,2604840,2604852,2604877,6279947,6279991,6280016,6280095}
most_common_freqs |

{0.0787667,0.0769333,0.00906667,0.00886667,0.00826667,0.00593333,0.00326667,0.003,0.00293333,0.0027,0.00266667,0.00263333,0.00256667,0.0025,0.00246667,0.00203333,0.00203333,0.00196667,0.0019,0.00186667,0.00183333,0.0018,0.00173333,0.00173333,0.00173333,0.0017,0.0017,0.00166667,0.00166667,0.00163333,0.00163333,0.00163333,0.00163333,0.00163333,0.0016,0.0016,0.0016,0.0016,0.00156667,0.00156667,0.00156667,0.00156667,0.00153333,0.00153333,0.00153333,0.00153333,0.00153333,0.00153333,0.0015,0.0015,0.0015,0.00146667,0.00146667,0.00146667,0.00146667,0.00146667,0.00146667,0.00146667,0.00146667,0.00143333,0.00143333,0.00143333,0.00143333,0.00143333,0.00143333,0.00143333,0.00143333,0.00143333,0.00143333,0.00143333,0.0014,0.0014,0.0014,0.0014,0.00136667,0.00136667,0.00136667,0.00136667,0.00136667,0.00136667,0.00136667,0.00136667,0.00136667,0.00136667,0.00133333,0.00133333,0.00133333,0.00133333,0.00133333,0.00133333,0.00133333,0.00133333,0.00133333,0.0013,0.0013,0.0013,0.0013,0.0013,0.0013,0.0013}
histogram_bounds  |

{-614,149124,436276,734992,1111802,1180324,1180449,1180481,1180507,1180610,1180640,1180656,1180672,1475625,1671884,1882852,2257454,2521497,2604750,2604785,2604821,2604857,2604895,2604923,2604957,2683740,3050195,3264561,3529673,3529821,3529894,3530041,3530072,3530093,3530125,3530151,3530181,3530216,3655474,3947599,4230064,4451407,4451648,4604143,4899541,5229325,5442183,5783894,6044973,6279792,6279830,6279872,6279934,6279988,6280024,6280057,6280087,6448106,6666623,6935161,7223774,7543005,7543220,7543548,7543678,7543706,7543733,7543763,7543785,7543831,7730234,8168222,8473126,8704950,8958785,8958894,8958920,8958946,8958981,8959021,8959054,8960124,8963427,9092223,9393810,9649295,9915513,10116459,10340456,10533434,10908764,11474630,12282455,13428124,14054953,14755339,15060207,15769093,16442810,17071416,17860068}
-[ RECORD 2

]-----+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
tablename         | audit_log
attname           | audit_id
n_distinct        | -1
most_common_vals  |
most_common_freqs |
histogram_bounds  |

{-899,172915,346206,520991,707646,900140,1090647,1274076,1455922,1631357,1802760,1992032,2160450,2341946,2514505,2670638,2851069,3031271,3190297,3359936,3536716,3706348,3899491,4067528,4232343,4405734,4574480,4753591,4930502,5122384,5287148,5460009,5657326,5824340,6020883,6214608,6409401,6606366,6779433,6945221,7123123,7294108,7495488,7649303,7816323,7997936,8191973,8362771,8526974,8733309,8911487,9099916,9289773,9472155,9661398,9825969,10004845,10176201,10351232,10527642,10680265,10853519,11040326,11229650,11422181,11605451,11806172,11985734,12171654,12364324,12559368,12729402,12912927,13073102,13287145,13455458,13649471,13826738,14004258,14187125,14356543,14539334,14715631,14895857,15060855,15231913,15404735,15577098,15742060,15901413,16088450,16270629,16458319,16650444,16826581,17003138,17158176,17315993,17497551,17687046,17879372}


Cheers

Mark





Re: 8.4 optimization regression?

From
Mark Kirkwood
Date:
On 24/08/11 15:15, Tom Lane wrote:
>
> Hmmm ... this is structurally a pretty simple query, so I'm surprised
> that 8.3 and 8.4 see it very much differently.  The relation-level
> estimates and plan choices are very nearly the same; the only thing
> that's changed much is the estimates of the join sizes, and there were
> not that many changes in the join selectivity estimation for simple
> inner joins.  I wonder whether you are seeing a bad side-effect of this
> patch:
>
> http://git.postgresql.org/gitweb/?p=postgresql.git&a=commitdiff&h=7f3eba30
>

Here is what the plan looks like with that patch reversed (it is back to
8.3 speed too).

QUERY PLAN  8.4 - 7f3eba30

-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
  Hash Join  (cost=948567.18..1882454.51 rows=349702 width=0) (actual
time=12320.037..38146.697 rows=217427 loops=1)
    Hash Cond: (cm.person_id = p.person_id)
    ->  Hash Join  (cost=791689.32..1702481.12 rows=581887 width=4)
(actual time=7492.004..32727.783 rows=248441 loops=1)
          Hash Cond: (cm.corresp_master_id = c.corresp_master_id)
          ->  Seq Scan on correspondence_master cm
(cost=0.00..777460.25 rows=34003380 width=12) (actual
time=0.016..8977.181 rows=33960209 loops=1)
                Filter: (person_id IS NOT NULL)
          ->  Hash  (cost=783297.43..783297.43 rows=671351 width=8)
(actual time=7375.019..7375.019 rows=354456 loops=1)
                ->  Hash Join  (cost=231577.28..783297.43 rows=671351
width=8) (actual time=6374.538..7257.067 rows=354456 loops=1)
                      Hash Cond: (c.generated_audit_id = gen_al.audit_id)
                      ->  Bitmap Heap Scan on correspondence c
(cost=77121.49..532445.85 rows=4247118 width=16) (actual
time=742.738..2790.225 rows=5293603 loops=1)
                            Recheck Cond: (corresp_type_id = ANY
('{CL11,CL11A,CL12,CL15,CL15A,CL16,DM_1,DM_2}'::text[]))
                            ->  Bitmap Index Scan on corresp_type_fk
(cost=0.00..76059.71 rows=4247118 width=0) (actual time=708.164..708.164
rows=5293603 loops=1)
                                  Index Cond: (corresp_type_id = ANY
('{CL11,CL11A,CL12,CL15,CL15A,CL16,DM_1,DM_2}'::text[]))
                      ->  Hash  (cost=108073.47..108073.47 rows=2827066
width=8) (actual time=2759.145..2759.145 rows=2819891 loops=1)
                            ->  Index Scan using
audit_log_audit_timestamp on audit_log gen_al  (cost=0.00..108073.47
rows=2827066 width=8) (actual time=0.085..1800.175 rows=2819891 loops=1)
                                  Index Cond: (audit_timestamp >
'2011-02-19 13:05:00'::timestamp without time zone)
    ->  Hash  (cost=115044.00..115044.00 rows=2549829 width=4) (actual
time=4827.310..4827.310 rows=3101177 loops=1)
          ->  Seq Scan on person p  (cost=0.00..115044.00 rows=2549829
width=4) (actual time=0.061..3600.767 rows=3101177 loops=1)
                Filter: (active AND (NOT exclude_walklist_alt) AND
(postal_address_id IS NULL) AND (NOT unpublished) AND
(enrolment_status_id = ANY ('{E,T}'::text[])) AND (person_type = ANY
('{M,D,O}'::text[])))
  Total runtime: 38171.865 ms


Cheers

Mark


Re: 8.4 optimization regression?

From
Mark Kirkwood
Date:
On 24/08/11 17:22, Mark Kirkwood wrote:
> On 24/08/11 15:15, Tom Lane wrote:
>>
>> Hmmm ... this is structurally a pretty simple query, so I'm surprised
>> that 8.3 and 8.4 see it very much differently.  The relation-level
>> estimates and plan choices are very nearly the same; the only thing
>> that's changed much is the estimates of the join sizes, and there were
>> not that many changes in the join selectivity estimation for simple
>> inner joins.  I wonder whether you are seeing a bad side-effect of this
>> patch:
>>
>> http://git.postgresql.org/gitweb/?p=postgresql.git&a=commitdiff&h=7f3eba30
>>
>>
>
> Here is what the plan looks like with that patch reversed (it is back
> to 8.3 speed too).
>
> QUERY PLAN  8.4 - 7f3eba30  (better plan snipped)
>
>

I note from the commit message that the fix test case was from Grzegorz
Jaskiewicz (antijoin against a small subset of a relation).  I was not
able to find this in the archives - Grzegorz do you recall the actual
test case? I thought it might be useful for me to spend some time
studying both cases and seeing if I can come up with any tweaks that
would let both your and my queries work well!

Cheers

Mark


Re: 8.4 optimization regression?

From
Grzegorz Jaśkiewicz
Date:
2011/8/29 Mark Kirkwood <mark.kirkwood@catalyst.net.nz>:

> I note from the commit message that the fix test case was from Grzegorz
> Jaskiewicz (antijoin against a small subset of a relation).  I was not able
> to find this in the archives - Grzegorz do you recall the actual test case?
> I thought it might be useful for me to spend some time studying both cases
> and seeing if I can come up with any tweaks that would let both your and my
> queries work well!

Sorry, I don't remember that particular example. If I complained about
it, it would  have been on this list or the general list.
I'll have a look by date.

--
GJ

Re: 8.4 optimization regression?

From
Mark Kirkwood
Date:
On 30/08/11 21:43, Grzegorz Jaśkiewicz wrote:
> 2011/8/29 Mark Kirkwood<mark.kirkwood@catalyst.net.nz>:
>
>> I note from the commit message that the fix test case was from Grzegorz
>> Jaskiewicz (antijoin against a small subset of a relation).  I was not able
>> to find this in the archives - Grzegorz do you recall the actual test case?
>> I thought it might be useful for me to spend some time studying both cases
>> and seeing if I can come up with any tweaks that would let both your and my
>> queries work well!
> Sorry, I don't remember that particular example. If I complained about
> it, it would  have been on this list or the general list.
> I'll have a look by date.
>

Thanks - however I think I have managed to make up a good test case that
shows the particular commit working. More on that to come!

Cheers

Mark

Re: 8.4 optimization regression?

From
Mark Kirkwood
Date:
On 24/08/11 15:15, Tom Lane wrote:
> Mark Kirkwood<mark.kirkwood@catalyst.net.nz>  writes:
>> I am in the progress of an 8.3 to 8.4 upgrade for a customer. I seem to
>> have stumbled upon what looks like a regression. The two databases
>> (8.3.14 and 8.4.8) have identical tuning parameters (where that makes
>> sense) and run on identical hardware. Both databases are regularly
>> vacuumed and analyzed (not by autovacuum), and performing an ANALYZE
>> does not change the plans shown below.
> Hmmm ... this is structurally a pretty simple query, so I'm surprised
> that 8.3 and 8.4 see it very much differently.  The relation-level
> estimates and plan choices are very nearly the same; the only thing
> that's changed much is the estimates of the join sizes, and there were
> not that many changes in the join selectivity estimation for simple
> inner joins.  I wonder whether you are seeing a bad side-effect of this
> patch:
>
> http://git.postgresql.org/gitweb/?p=postgresql.git&a=commitdiff&h=7f3eba30
>
> That code would only be reached when one or both join columns lack MCV
> lists in pg_stats; if you had analyzed, the only reason for that to be
> the case is if the column is unique (or nearly so, in ANALYZE's opinion).
>
>

I've come up with (hopefully) a good set of semi, anti and regular joins
to demonstrate the effect of this commit. I've attached them, and the
schema generator (I believe I've used this before for optimization
examples...).

Also I've tried out an experimental patch to make joins like the one I'm
having trouble with *and* also the anti joins the commit was for - get
better row estimates.

So firstly consider an anti join (these are run against git HEAD rather
than 8.4):

EXPLAIN ANALYZE SELECT 1 FROM nodekeyword nk WHERE nk.keywordid < 100000
AND NOT EXISTS (SELECT 1 FROM node n WHERE n.nodeid = nk.nodeid AND
n.updated > '2011-12-01'::timestamp );

With commit:
                                                           QUERY PLAN

-----------------------------------------------------------------------------------------------------------------------------------
  Hash Anti Join  (cost=426079.34..765699.66 rows=1599293 width=0)
(actual time=29907.716..47255.825 rows=1839193 loops=1)
    Hash Cond: (nk.nodeid = n.nodeid)
    ->  Seq Scan on nodekeyword nk  (cost=0.00..297414.03 rows=1991560
width=4) (actual time=5.373..11838.738 rows=1999780 loops=1)
          Filter: (keywordid < 100000)
    ->  Hash  (cost=419643.00..419643.00 rows=392267 width=4) (actual
time=29883.980..29883.980 rows=401678 loops=1)
          Buckets: 4096  Batches: 16  Memory Usage: 891kB
          ->  Seq Scan on node n  (cost=0.00..419643.00 rows=392267
width=4) (actual time=0.339..29295.764 rows=401678 loops=1)
                Filter: (updated > '2011-12-01 00:00:00'::timestamp
without time zone)


Without commit:
                                                            QUERY PLAN

-----------------------------------------------------------------------------------------------------------------------------------
  Hash Anti Join  (cost=426079.34..760501.96 rows=1 width=0) (actual
time=30409.336..47919.613 rows=1839193 loops=1)
    Hash Cond: (nk.nodeid = n.nodeid)
    ->  Seq Scan on nodekeyword nk  (cost=0.00..297414.03 rows=1991560
width=4) (actual time=5.359..12081.372 rows=1999780 loops=1)
          Filter: (keywordid < 100000)
    ->  Hash  (cost=419643.00..419643.00 rows=392267 width=4) (actual
time=30392.235..30392.235 rows=401678 loops=1)
          Buckets: 4096  Batches: 16  Memory Usage: 891kB
          ->  Seq Scan on node n  (cost=0.00..419643.00 rows=392267
width=4) (actual time=0.384..29806.407 rows=401678 loops=1)
                Filter: (updated > '2011-12-01 00:00:00'::timestamp
without time zone)


Note the rows estimate for the anti join is hopelessly wrong, so clearly
the commitdoes the job here (I think this models the test case for said
commit)!

Now some joins:

EXPLAIN ANALYZE SELECT 1 FROM NODE n JOIN nodekeyword nk ON (n.nodeid =
nk.nodeid) WHERE n.updated > '2011-01-01'::timestamp AND nk.keywordid <
100000;

With commit:
                                                           QUERY PLAN

-----------------------------------------------------------------------------------------------------------------------------------
  Hash Join  (cost=501666.88..871512.65 rows=1991560 width=0) (actual
time=30032.836..53073.731 rows=1993866 loops=1)
    Hash Cond: (nk.nodeid = n.nodeid)
    ->  Seq Scan on nodekeyword nk  (cost=0.00..297414.03 rows=1991560
width=4) (actual time=5.327..14393.629 rows=1999780 loops=1)
          Filter: (keywordid < 100000)
    ->  Hash  (cost=419643.00..419643.00 rows=4999510 width=4) (actual
time=30017.777..30017.777 rows=4985269 loops=1)
          Buckets: 4096  Batches: 256  Memory Usage: 699kB
          ->  Seq Scan on node n  (cost=0.00..419643.00 rows=4999510
width=4) (actual time=0.005..23272.287 rows=4985269 loops=1)
                Filter: (updated > '2011-01-01 00:00:00'::timestamp
without time zone)


Without commit:
                                                             QUERY PLAN

-----------------------------------------------------------------------------------------------------------------------------------
  Hash Join  (cost=501666.88..871510.70 rows=1991365 width=0) (actual
time=30549.498..54852.399 rows=1993866 loops=1)
    Hash Cond: (nk.nodeid = n.nodeid)
    ->  Seq Scan on nodekeyword nk  (cost=0.00..297414.03 rows=1991560
width=4) (actual time=5.331..13760.417 rows=1999780 loops=1)
          Filter: (keywordid < 100000)
    ->  Hash  (cost=419643.00..419643.00 rows=4999510 width=4) (actual
time=30534.464..30534.464 rows=4985269 loops=1)
          Buckets: 4096  Batches: 256  Memory Usage: 699kB
          ->  Seq Scan on node n  (cost=0.00..419643.00 rows=4999510
width=4) (actual time=0.005..23696.167 rows=4985269 loops=1)
                Filter: (updated > '2011-01-01 00:00:00'::timestamp
without time zone)


Another join:

EXPLAIN ANALYZE SELECT 1 FROM NODE n JOIN nodekeyword nk ON (n.nodeid =
nk.nodeid) WHERE n.updated > '2011-12-01'::timestamp AND nk.keywordid <
100000;

With commit:
                                                             QUERY PLAN

------------------------------------------------------------------------------------------------------------------------------------
  Hash Join  (cost=426079.34..764424.63 rows=392267 width=0) (actual
time=29295.966..45578.876 rows=160587 loops=1)
    Hash Cond: (nk.nodeid = n.nodeid)
    ->  Seq Scan on nodekeyword nk  (cost=0.00..297414.03 rows=1991560
width=4) (actual time=12.452..12367.760 rows=1999780 loops=1)
          Filter: (keywordid < 100000)
    ->  Hash  (cost=419643.00..419643.00 rows=392267 width=4) (actual
time=29273.571..29273.571 rows=401678 loops=1)
          Buckets: 4096  Batches: 16  Memory Usage: 891kB
          ->  Seq Scan on node n  (cost=0.00..419643.00 rows=392267
width=4) (actual time=10.899..28678.818 rows=401678 loops=1)
                Filter: (updated > '2011-12-01 00:00:00'::timestamp
without time zone)


Without commit:
                                                             QUERY PLAN

------------------------------------------------------------------------------------------------------------------------------------
  Hash Join  (cost=426079.34..762064.41 rows=156245 width=0) (actual
time=29179.313..44605.243 rows=160587 loops=1)
    Hash Cond: (nk.nodeid = n.nodeid)
    ->  Seq Scan on nodekeyword nk  (cost=0.00..297414.03 rows=1991560
width=4) (actual time=12.486..11546.469 rows=1999780 loops=1)
          Filter: (keywordid < 100000)
    ->  Hash  (cost=419643.00..419643.00 rows=392267 width=4) (actual
time=29156.889..29156.889 rows=401678 loops=1)
          Buckets: 4096  Batches: 16  Memory Usage: 891kB
          ->  Seq Scan on node n  (cost=0.00..419643.00 rows=392267
width=4) (actual time=10.915..28545.553 rows=401678 loops=1)
                Filter: (updated > '2011-12-01 00:00:00'::timestamp
without time zone)


So in the case where we filer out a large percentage of the rows the
commit inflates the estimates...consider a more extreme example:


EXPLAIN ANALYZE SELECT 1 FROM NODE n JOIN nodekeyword nk ON (n.nodeid =
nk.nodeid) WHERE n.updated > '2011-12-27'::timestamp AND nk.keywordid <
10000;

With commit:
                                                               QUERY PLAN

-----------------------------------------------------------------------------------------------------------------------------------------
  Nested Loop  (cost=0.00..569488.45 rows=16344 width=0) (actual
time=55.452..65341.135 rows=604 loops=1)
    ->  Seq Scan on node n  (cost=0.00..419643.00 rows=16344 width=4)
(actual time=13.537..46138.214 rows=14952 loops=1)
          Filter: (updated > '2011-12-27 00:00:00'::timestamp without
time zone)
    ->  Index Scan using nodekeyword_pk on nodekeyword nk
(cost=0.00..9.16 rows=1 width=4) (actual time=1.277..1.279 rows=0
loops=14952)
          Index Cond: ((nodeid = n.nodeid) AND (keywordid < 10000))


Without commit:
                                                                QUERY PLAN

-----------------------------------------------------------------------------------------------------------------------------------------
  Nested Loop  (cost=0.00..569488.45 rows=631 width=0) (actual
time=43.969..64988.036 rows=604 loops=1)
    ->  Seq Scan on node n  (cost=0.00..419643.00 rows=16344 width=4)
(actual time=2.060..46065.879 rows=14952 loops=1)
          Filter: (updated > '2011-12-27 00:00:00'::timestamp without
time zone)
    ->  Index Scan using nodekeyword_pk on nodekeyword nk
(cost=0.00..9.16 rows=1 width=4) (actual time=1.259..1.260 rows=0
loops=14952)
          Index Cond: ((nodeid = n.nodeid) AND (keywordid < 10000))


So clearly this commit is not so good for this type of join (this models
the case I posted initially).

Now four semi joins:

EXPLAIN ANALYZE SELECT 1 FROM nodekeyword nk WHERE nk.keywordid < 100000
AND EXISTS (SELECT 1 FROM node n WHERE n.nodeid = nk.nodeid AND
n.updated > '2011-12-01'::timestamp );

With commit:
                                                             QUERY PLAN

-----------------------------------------------------------------------------------------------------------------------------------
  Hash Semi Join  (cost=426079.34..753629.40 rows=392267 width=0)
(actual time=28405.965..43724.471 rows=160587 loops=1)
    Hash Cond: (nk.nodeid = n.nodeid)
    ->  Seq Scan on nodekeyword nk  (cost=0.00..297414.03 rows=1991560
width=4) (actual time=5.767..11561.340 rows=1999780 loops=1)
          Filter: (keywordid < 100000)
    ->  Hash  (cost=419643.00..419643.00 rows=392267 width=4) (actual
time=28391.293..28391.293 rows=401678 loops=1)
          Buckets: 4096  Batches: 16  Memory Usage: 891kB
          ->  Seq Scan on node n  (cost=0.00..419643.00 rows=392267
width=4) (actual time=0.038..27820.097 rows=401678 loops=1)
                Filter: (updated > '2011-12-01 00:00:00'::timestamp
without time zone)

Without commit:
                                                            QUERY PLAN

-----------------------------------------------------------------------------------------------------------------------------------
  Hash Semi Join  (cost=426079.34..780417.56 rows=1991560 width=0)
(actual time=29447.638..44738.280 rows=160587 loops=1)
    Hash Cond: (nk.nodeid = n.nodeid)
    ->  Seq Scan on nodekeyword nk  (cost=0.00..297414.03 rows=1991560
width=4) (actual time=5.771..11501.350 rows=1999780 loops=1)
          Filter: (keywordid < 100000)
    ->  Hash  (cost=419643.00..419643.00 rows=392267 width=4) (actual
time=29433.952..29433.952 rows=401678 loops=1)
          Buckets: 4096  Batches: 16  Memory Usage: 891kB
          ->  Seq Scan on node n  (cost=0.00..419643.00 rows=392267
width=4) (actual time=0.040..28850.800 rows=401678 loops=1)
                Filter: (updated > '2011-12-01 00:00:00'::timestamp
without time zone)

Clearly row estimation is hopelessly broken *without* this commit here.

Another semi join:

EXPLAIN ANALYZE SELECT 1 FROM nodekeyword nk WHERE nk.keywordid < 100000
AND EXISTS (SELECT 1 FROM node n WHERE n.nodeid = nk.nodeid AND
n.updated > '2011-01-01'::timestamp );

With commit:
                                                             QUERY PLAN

------------------------------------------------------------------------------------------------------------------------------------
  Hash Semi Join  (cost=501666.88..871512.65 rows=1991560 width=0)
(actual time=29048.154..51230.453 rows=1993866 loops=1)
    Hash Cond: (nk.nodeid = n.nodeid)
    ->  Seq Scan on nodekeyword nk  (cost=0.00..297414.03 rows=1991560
width=4) (actual time=12.423..13430.618 rows=1999780 loops=1)
          Filter: (keywordid < 100000)
    ->  Hash  (cost=419643.00..419643.00 rows=4999510 width=4) (actual
time=29024.442..29024.442 rows=4985269 loops=1)
          Buckets: 4096  Batches: 256  Memory Usage: 699kB
          ->  Seq Scan on node n  (cost=0.00..419643.00 rows=4999510
width=4) (actual time=0.010..22384.904 rows=4985269 loops=1)
                Filter: (updated > '2011-01-01 00:00:00'::timestamp
without time zone)


Without commit:
                                                             QUERY PLAN

------------------------------------------------------------------------------------------------------------------------------------
  Hash Semi Join  (cost=501666.88..871512.65 rows=1991560 width=0)
(actual time=28914.970..51162.918 rows=1993866 loops=1)
    Hash Cond: (nk.nodeid = n.nodeid)
    ->  Seq Scan on nodekeyword nk  (cost=0.00..297414.03 rows=1991560
width=4) (actual time=12.504..13780.506 rows=1999780 loops=1)
          Filter: (keywordid < 100000)
    ->  Hash  (cost=419643.00..419643.00 rows=4999510 width=4) (actual
time=28891.705..28891.705 rows=4985269 loops=1)
          Buckets: 4096  Batches: 256  Memory Usage: 699kB
          ->  Seq Scan on node n  (cost=0.00..419643.00 rows=4999510
width=4) (actual time=0.008..22082.459 rows=4985269 loops=1)
                Filter: (updated > '2011-01-01 00:00:00'::timestamp
without time zone)


Another semi join:

EXPLAIN ANALYZE SELECT 1 FROM nodekeyword nk WHERE nk.keywordid < 10000
AND EXISTS (SELECT 1 FROM node n WHERE n.nodeid = nk.nodeid AND
n.updated > '2011-01-01'::timestamp );

With commit:
                                                             QUERY PLAN

----------------------------------------------------------------------------------------------------------------------------------
  Hash Semi Join  (cost=501666.88..823736.17 rows=192921 width=0)
(actual time=30120.347..49646.175 rows=199050 loops=1)
    Hash Cond: (nk.nodeid = n.nodeid)
    ->  Seq Scan on nodekeyword nk  (cost=0.00..297414.03 rows=192921
width=4) (actual time=12.359..16335.889 rows=199616 loops=1)
          Filter: (keywordid < 10000)
    ->  Hash  (cost=419643.00..419643.00 rows=4999510 width=4) (actual
time=30072.444..30072.444 rows=4985269 loops=1)
          Buckets: 4096  Batches: 256  Memory Usage: 699kB
          ->  Seq Scan on node n  (cost=0.00..419643.00 rows=4999510
width=4) (actual time=0.009..23409.799 rows=4985269 loops=1)
                Filter: (updated > '2011-01-01 00:00:00'::timestamp
without time zone)

Without commit:
                                                           QUERY PLAN

----------------------------------------------------------------------------------------------------------------------------------
  Hash Semi Join  (cost=501666.88..823736.17 rows=192921 width=0)
(actual time=29395.513..48857.600 rows=199050 loops=1)
    Hash Cond: (nk.nodeid = n.nodeid)
    ->  Seq Scan on nodekeyword nk  (cost=0.00..297414.03 rows=192921
width=4) (actual time=12.528..16261.983 rows=199616 loops=1)
          Filter: (keywordid < 10000)
    ->  Hash  (cost=419643.00..419643.00 rows=4999510 width=4) (actual
time=29348.826..29348.826 rows=4985269 loops=1)
          Buckets: 4096  Batches: 256  Memory Usage: 699kB
          ->  Seq Scan on node n  (cost=0.00..419643.00 rows=4999510
width=4) (actual time=0.009..22505.930 rows=4985269 loops=1)
                Filter: (updated > '2011-01-01 00:00:00'::timestamp
without time zone)


Final semi join:

EXPLAIN ANALYZE SELECT 1 FROM nodekeyword nk WHERE nk.keywordid < 10000
AND EXISTS (SELECT 1 FROM node n WHERE n.nodeid = nk.nodeid AND
n.updated > '2011-12-01'::timestamp );

With commit:
                                                            QUERY PLAN

----------------------------------------------------------------------------------------------------------------------------------
  Hash Semi Join  (cost=426079.34..730392.78 rows=192921 width=0)
(actual time=29060.665..44713.615 rows=16003 loops=1)
    Hash Cond: (nk.nodeid = n.nodeid)
    ->  Seq Scan on nodekeyword nk  (cost=0.00..297414.03 rows=192921
width=4) (actual time=12.366..15064.457 rows=199616 loops=1)
          Filter: (keywordid < 10000)
    ->  Hash  (cost=419643.00..419643.00 rows=392267 width=4) (actual
time=29026.017..29026.017 rows=401678 loops=1)
          Buckets: 4096  Batches: 16  Memory Usage: 891kB
          ->  Seq Scan on node n  (cost=0.00..419643.00 rows=392267
width=4) (actual time=0.039..28441.039 rows=401678 loops=1)
                Filter: (updated > '2011-12-01 00:00:00'::timestamp
without time zone)

Without commit:
                                                            QUERY PLAN

----------------------------------------------------------------------------------------------------------------------------------
  Hash Semi Join  (cost=426079.34..730392.78 rows=192921 width=0)
(actual time=28969.107..43725.339 rows=16003 loops=1)
    Hash Cond: (nk.nodeid = n.nodeid)
    ->  Seq Scan on nodekeyword nk  (cost=0.00..297414.03 rows=192921
width=4) (actual time=12.486..14198.613 rows=199616 loops=1)
          Filter: (keywordid < 10000)
    ->  Hash  (cost=419643.00..419643.00 rows=392267 width=4) (actual
time=28935.248..28935.248 rows=401678 loops=1)
          Buckets: 4096  Batches: 16  Memory Usage: 891kB
          ->  Seq Scan on node n  (cost=0.00..419643.00 rows=392267
width=4) (actual time=0.047..28343.005 rows=401678 loops=1)
                Filter: (updated > '2011-12-01 00:00:00'::timestamp
without time zone)

Well this guy is too sneaky for either case :-(

We seem to need a patch variant that *only* clamps the estimates in the
anti or semi join case, e.g (note against git HEAD):

diff --git a/src/backend/utils/adt/selfuncs.c
b/src/backend/utils/adt/selfuncs.c
index e065826..bf5002f 100644
--- a/src/backend/utils/adt/selfuncs.c
+++ b/src/backend/utils/adt/selfuncs.c
@@ -2257,11 +2257,6 @@ eqjoinsel_inner(Oid operator,
         double      nullfrac1 = stats1 ? stats1->stanullfrac : 0.0;
         double      nullfrac2 = stats2 ? stats2->stanullfrac : 0.0;

-       if (vardata1->rel)
-           nd1 = Min(nd1, vardata1->rel->rows);
-       if (vardata2->rel)
-           nd2 = Min(nd2, vardata2->rel->rows);
-
         selec = (1.0 - nullfrac1) * (1.0 - nullfrac2);
         if (nd1 > nd2)
             selec /= nd1;


Now run all the queries, 1st the anti join:
                                                              QUERY PLAN

------------------------------------------------------------------------------------------------------------------------------------
  Hash Anti Join  (cost=426079.34..765699.66 rows=1599293 width=0)
(actual time=30121.008..48503.453 rows=1839193 loops=1)
    Hash Cond: (nk.nodeid = n.nodeid)
    ->  Seq Scan on nodekeyword nk  (cost=0.00..297414.03 rows=1991560
width=4) (actual time=12.623..12853.522 rows=1999780 loops=1)
          Filter: (keywordid < 100000)
    ->  Hash  (cost=419643.00..419643.00 rows=392267 width=4) (actual
time=30108.058..30108.058 rows=401678 loops=1)
          Buckets: 4096  Batches: 16  Memory Usage: 891kB
          ->  Seq Scan on node n  (cost=0.00..419643.00 rows=392267
width=4) (actual time=0.347..29508.393 rows=401678 loops=1)
                Filter: (updated > '2011-12-01 00:00:00'::timestamp
without time zone)


And the  3 joins:

                                                              QUERY PLAN

------------------------------------------------------------------------------------------------------------------------------------
  Hash Join  (cost=501666.88..871510.70 rows=1991365 width=0) (actual
time=30148.073..52370.308 rows=1993866 loops=1)
    Hash Cond: (nk.nodeid = n.nodeid)
    ->  Seq Scan on nodekeyword nk  (cost=0.00..297414.03 rows=1991560
width=4) (actual time=12.291..13300.233 rows=1999780 loops=1)
          Filter: (keywordid < 100000)
    ->  Hash  (cost=419643.00..419643.00 rows=4999510 width=4) (actual
time=30124.453..30124.453 rows=4985269 loops=1)
          Buckets: 4096  Batches: 256  Memory Usage: 699kB
          ->  Seq Scan on node n  (cost=0.00..419643.00 rows=4999510
width=4) (actual time=0.009..23334.774 rows=4985269 loops=1)
                Filter: (updated > '2011-01-01 00:00:00'::timestamp
without time zone)


                                                              QUERY PLAN

------------------------------------------------------------------------------------------------------------------------------------
  Hash Join  (cost=426079.34..762064.41 rows=156245 width=0) (actual
time=29954.251..46014.379 rows=160587 loops=1)
    Hash Cond: (nk.nodeid = n.nodeid)
    ->  Seq Scan on nodekeyword nk  (cost=0.00..297414.03 rows=1991560
width=4) (actual time=12.420..12126.142 rows=1999780 loops=1)
          Filter: (keywordid < 100000)
    ->  Hash  (cost=419643.00..419643.00 rows=392267 width=4) (actual
time=29936.578..29936.578 rows=401678 loops=1)
          Buckets: 4096  Batches: 16  Memory Usage: 891kB
          ->  Seq Scan on node n  (cost=0.00..419643.00 rows=392267
width=4) (actual time=10.934..29357.789 rows=401678 loops=1)
                Filter: (updated > '2011-12-01 00:00:00'::timestamp
without time zone)


                                                               QUERY PLAN

-----------------------------------------------------------------------------------------------------------------------------------------
  Nested Loop  (cost=0.00..569488.45 rows=631 width=0) (actual
time=44.065..67179.686 rows=604 loops=1)
    ->  Seq Scan on node n  (cost=0.00..419643.00 rows=16344 width=4)
(actual time=2.165..48523.075 rows=14952 loops=1)
          Filter: (updated > '2011-12-27 00:00:00'::timestamp without
time zone)
    ->  Index Scan using nodekeyword_pk on nodekeyword nk
(cost=0.00..9.16 rows=1 width=4) (actual time=1.241..1.242 rows=0
loops=14952)
          Index Cond: ((nodeid = n.nodeid) AND (keywordid < 10000))


And the 4 semi joins...

                                                             QUERY PLAN

-----------------------------------------------------------------------------------------------------------------------------------
  Hash Semi Join  (cost=426079.34..753629.40 rows=392267 width=0)
(actual time=29355.949..45220.958 rows=160587 loops=1)
    Hash Cond: (nk.nodeid = n.nodeid)
    ->  Seq Scan on nodekeyword nk  (cost=0.00..297414.03 rows=1991560
width=4) (actual time=5.731..11983.132 rows=1999780 loops=1)
          Filter: (keywordid < 100000)
    ->  Hash  (cost=419643.00..419643.00 rows=392267 width=4) (actual
time=29342.387..29342.387 rows=401678 loops=1)
          Buckets: 4096  Batches: 16  Memory Usage: 891kB
          ->  Seq Scan on node n  (cost=0.00..419643.00 rows=392267
width=4) (actual time=0.039..28763.514 rows=401678 loops=1)
                Filter: (updated > '2011-12-01 00:00:00'::timestamp
without time zone)

                                                            QUERY PLAN

------------------------------------------------------------------------------------------------------------------------------------
  Hash Semi Join  (cost=501666.88..871512.65 rows=1991560 width=0)
(actual time=30823.334..53136.910 rows=1993866 loops=1)
    Hash Cond: (nk.nodeid = n.nodeid)
    ->  Seq Scan on nodekeyword nk  (cost=0.00..297414.03 rows=1991560
width=4) (actual time=12.555..13881.366 rows=1999780 loops=1)
          Filter: (keywordid < 100000)
    ->  Hash  (cost=419643.00..419643.00 rows=4999510 width=4) (actual
time=30800.017..30800.017 rows=4985269 loops=1)
          Buckets: 4096  Batches: 256  Memory Usage: 699kB
          ->  Seq Scan on node n  (cost=0.00..419643.00 rows=4999510
width=4) (actual time=0.010..24028.932 rows=4985269 loops=1)
                Filter: (updated > '2011-01-01 00:00:00'::timestamp
without time zone)

                                                            QUERY PLAN

----------------------------------------------------------------------------------------------------------------------------------
  Hash Semi Join  (cost=501666.88..823736.17 rows=192921 width=0)
(actual time=29278.861..48346.647 rows=199050 loops=1)
    Hash Cond: (nk.nodeid = n.nodeid)
    ->  Seq Scan on nodekeyword nk  (cost=0.00..297414.03 rows=192921
width=4) (actual time=12.523..15809.390 rows=199616 loops=1)
          Filter: (keywordid < 10000)
    ->  Hash  (cost=419643.00..419643.00 rows=4999510 width=4) (actual
time=29232.161..29232.161 rows=4985269 loops=1)
          Buckets: 4096  Batches: 256  Memory Usage: 699kB
          ->  Seq Scan on node n  (cost=0.00..419643.00 rows=4999510
width=4) (actual time=0.009..22541.899 rows=4985269 loops=1)
                Filter: (updated > '2011-01-01 00:00:00'::timestamp
without time zone)

                                                            QUERY PLAN

----------------------------------------------------------------------------------------------------------------------------------
  Hash Semi Join  (cost=426079.34..730392.78 rows=192921 width=0)
(actual time=28594.210..42976.001 rows=16003 loops=1)
    Hash Cond: (nk.nodeid = n.nodeid)
    ->  Seq Scan on nodekeyword nk  (cost=0.00..297414.03 rows=192921
width=4) (actual time=12.581..13810.924 rows=199616 loops=1)
          Filter: (keywordid < 10000)
    ->  Hash  (cost=419643.00..419643.00 rows=392267 width=4) (actual
time=28560.258..28560.258 rows=401678 loops=1)
          Buckets: 4096  Batches: 16  Memory Usage: 891kB
          ->  Seq Scan on node n  (cost=0.00..419643.00 rows=392267
width=4) (actual time=0.048..27983.235 rows=401678 loops=1)
                Filter: (updated > '2011-12-01 00:00:00'::timestamp
without time zone)

(this last one was wildly inaccurate pre patching)

So this looks quite encouraging (unless I have overlooked a set of
queries that now perform worse - which could be the case), thoughts?

regards

Mark




Attachment

Re: 8.4 optimization regression?

From
Tom Lane
Date:
Mark Kirkwood <mark.kirkwood@catalyst.net.nz> writes:
> [ assorted examples showing that commit
>   7f3eba30c9d622d1981b1368f2d79ba0999cdff2 has got problems ]

Thanks for the test cases.  After playing with these for a bit I believe
I've figured out the error in my previous thinking.  Clamping the
ndistinct value like that can improve matters when applied to the inside
relation of a semi or anti join, but in all other cases it's just wrong.
If you think about what is happening in eqjoinsel_inner with the patch,
we are reducing the ndistinct estimate for the join key column
proportionally to the selectivity of whatever baserel restrictions
apply.  This then results in proportionally increasing the selectivity
number for the join condition --- in other words, we're more or less
cancelling out the effects of one or the other relation's base
restrictions.  So that's pretty broken in general.  The reason it is
important for semi/antijoin inner relations is that this is actually the
only way that restrictions applied to the inner rel get to impact the
join size estimate at all, since set_joinrel_size_estimates is not going
to factor the inner rel size into what it multiplies the join selectivity
against.

In short, I was mistakenly extrapolating from the observation that it
helped to hack the ndistinct estimate for a semijoin's inner rel, to
the conclusion that we should do that for all join input rels.

So, not only are you correct that we should revert the changes to
eqjoinsel_inner, but what's happening in eqjoinsel_semi is wrong too.
It should only be clamping the ndistinct value for the inner side.
And I think it needs to be taking that into account for the case where
it does have MCVs as well as the case where it doesn't.

So I'll go back to this with hopefully a clearer picture of what's
happening.  Thanks again for the test cases.

            regards, tom lane

Re: 8.4 optimization regression?

From
Tom Lane
Date:
I wrote:
> Mark Kirkwood <mark.kirkwood@catalyst.net.nz> writes:
>> [ assorted examples showing that commit
>> 7f3eba30c9d622d1981b1368f2d79ba0999cdff2 has got problems ]

> ...
> So, not only are you correct that we should revert the changes to
> eqjoinsel_inner, but what's happening in eqjoinsel_semi is wrong too.

I've retested these examples with the patches I committed yesterday.
Six of the eight examples are estimated pretty nearly dead on, while the
other two are estimated about 50% too high (still a lot better than
before).  AFAICT there's no easy way to improve those estimates further;
eqjoinsel_semi just plain hasn't got enough information to know how many
matches there will be.

            regards, tom lane

Re: 8.4 optimization regression?

From
Mark Kirkwood
Date:
On 02/09/11 11:13, Tom Lane wrote:
> I wrote:
>> Mark Kirkwood<mark.kirkwood@catalyst.net.nz>  writes:
>>> [ assorted examples showing that commit
>>> 7f3eba30c9d622d1981b1368f2d79ba0999cdff2 has got problems ]
>> ...
>> So, not only are you correct that we should revert the changes to
>> eqjoinsel_inner, but what's happening in eqjoinsel_semi is wrong too.
> I've retested these examples with the patches I committed yesterday.
> Six of the eight examples are estimated pretty nearly dead on, while the
> other two are estimated about 50% too high (still a lot better than
> before).  AFAICT there's no easy way to improve those estimates further;
> eqjoinsel_semi just plain hasn't got enough information to know how many
> matches there will be.
>
>

Just noticed your two commits this morning and ran them through the
examples too - results look really good! Not only are the plain join
queries looking way better but that last semi join that was way off is
now being estimated pretty close. Should be interesting to see how much
this improves more complex queries!

Cheers

Mark


Re: 8.4 optimization regression?

From
Mark Kirkwood
Date:
On 02/09/11 11:18, Mark Kirkwood wrote:
> On 02/09/11 11:13, Tom Lane wrote:
>> I wrote:
>>> Mark Kirkwood<mark.kirkwood@catalyst.net.nz>  writes:
>>>> [ assorted examples showing that commit
>>>> 7f3eba30c9d622d1981b1368f2d79ba0999cdff2 has got problems ]
>>> ...
>>> So, not only are you correct that we should revert the changes to
>>> eqjoinsel_inner, but what's happening in eqjoinsel_semi is wrong too.
>> I've retested these examples with the patches I committed yesterday.
>> Six of the eight examples are estimated pretty nearly dead on, while the
>> other two are estimated about 50% too high (still a lot better than
>> before).  AFAICT there's no easy way to improve those estimates further;
>> eqjoinsel_semi just plain hasn't got enough information to know how many
>> matches there will be.
>>
>>
>
> Just noticed your two commits this morning and ran them through the
> examples too - results look really good! Not only are the plain join
> queries looking way better but that last semi join that was way off is
> now being estimated pretty close. Should be interesting to see how
> much this improves more complex queries!
>
>

While this is still fresh in your mind, a couple of additional anti join
queries are still managing to sneak past estimation:

EXPLAIN ANALYZE SELECT 1 FROM nodekeyword nk WHERE nk.keywordid < 100000
AND NOT EXISTS (SELECT 1 FROM node n WHERE n.nodeid = nk.nodeid AND
n.updated > '2011-01-01'::timestamp );
                                                              QUERY PLAN

------------------------------------------------------------------------------------------------------------------------------------
  Hash Anti Join  (cost=501666.88..851597.05 rows=1 width=0) (actual
time=29956.971..50933.702 rows=5914 loops=1)
    Hash Cond: (nk.nodeid = n.nodeid)
    ->  Seq Scan on nodekeyword nk  (cost=0.00..297414.03 rows=1991560
width=4) (actual time=13.352..13765.749 rows=1999780 loops=1)
          Filter: (keywordid < 100000)
    ->  Hash  (cost=419643.00..419643.00 rows=4999510 width=4) (actual
time=29345.238..29345.238 rows=4985269 loops=1)
          Buckets: 4096  Batches: 256  Memory Usage: 699kB
          ->  Seq Scan on node n  (cost=0.00..419643.00 rows=4999510
width=4) (actual time=0.010..22731.316 rows=4985269 loops=1)
                Filter: (updated > '2011-01-01 00:00:00'::timestamp
without time zone)

EXPLAIN ANALYZE SELECT 1 FROM nodekeyword nk WHERE nk.keywordid < 10000
AND NOT EXISTS (SELECT 1 FROM node n WHERE n.nodeid = nk.nodeid AND
n.updated > '2011-01-01'::timestamp );
                                                             QUERY PLAN

----------------------------------------------------------------------------------------------------------------------------------
  Hash Anti Join  (cost=501666.88..821806.96 rows=1 width=0) (actual
time=46497.231..49196.057 rows=566 loops=1)
    Hash Cond: (nk.nodeid = n.nodeid)
    ->  Seq Scan on nodekeyword nk  (cost=0.00..297414.03 rows=192921
width=4) (actual time=19.916..16250.224 rows=199616 loops=1)
          Filter: (keywordid < 10000)
    ->  Hash  (cost=419643.00..419643.00 rows=4999510 width=4) (actual
time=29901.178..29901.178 rows=4985269 loops=1)
          Buckets: 4096  Batches: 256  Memory Usage: 699kB
          ->  Seq Scan on node n  (cost=0.00..419643.00 rows=4999510
width=4) (actual time=0.008..23207.964 rows=4985269 loops=1)
                Filter: (updated > '2011-01-01 00:00:00'::timestamp
without time zone)







Re: 8.4 optimization regression?

From
Tom Lane
Date:
Mark Kirkwood <mark.kirkwood@catalyst.net.nz> writes:
> While this is still fresh in your mind, a couple of additional anti join
> queries are still managing to sneak past estimation:

Yeah, those are estimating that all the outer rows have join partners,
because there are more distinct values in the sub-select than there are
in the outer relation.  AFAICS there are not any errors in the
statistics, it's just that the estimation rule falls down here.

If you've heard of a better estimator for semijoin/antijoin selectivity,
I'm all ears.  The best idea I have at the moment is to put an arbitrary
upper limit on the estimated selectivity, but that would be, well,
arbitrary.

            regards, tom lane