Thread: 8.4 optimization regression?
I am in the progress of an 8.3 to 8.4 upgrade for a customer. I seem to have stumbled upon what looks like a regression. The two databases (8.3.14 and 8.4.8) have identical tuning parameters (where that makes sense) and run on identical hardware. Both databases are regularly vacuumed and analyzed (not by autovacuum), and performing an ANALYZE does not change the plans shown below. The query below is a slightly simplified version of what a report uses. I note that 8.3 chooses a better plan (approx twice as fast) and 8.4 is massively *overestimating* the rows from the top hash join (8.3 massively underestimates them mind you). This row overestimating becomes more of an issue when the remaining subqueries etc are added into the query below - to the point where the 8.4 runtime goes into days unless. Now I have some ways around that (convert NOT IN to NOT EXISTS), but there seems to be nothing I can do to speed this base query, which unfortunately is a common construction in the application. Any ideas? Query and plans: EXPLAIN ANALYZE SELECT 1 FROM correspondence c JOIN audit_log gen_al on (gen_al.audit_id = generated_audit_id) JOIN correspondence_master cm using(corresp_master_id) JOIN person p on(p.person_id = cm.person_id) WHERE c.corresp_type_id IN ('CL11', 'CL11A', 'CL12', 'CL15', 'CL15A', 'CL16', 'DM_1', 'DM_2') AND cm.person_id IS NOT NULL AND gen_al.audit_timestamp > ('2011-08-19 13:05'::timestamp - '6 months'::interval) AND p.active = true AND p.exclude_walklist_alt = false AND p.postal_address_id is null AND p.unpublished = false AND p.enrolment_status_id in ('E', 'T') AND p.person_type in ('M', 'D', 'O') QUERY PLAN 8.3 ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- Hash Join (cost=1710864.04..1835979.46 rows=33873 width=0) (actual time=34520.372..39762.042 rows=220336 loops=1) Hash Cond: (p.person_id = cm.person_id) -> Seq Scan on person p (cost=0.00..115154.11 rows=2566020 width=4) (actual time=0.066..3347.560 rows=3101074 loops=1) Filter: (active AND (NOT exclude_walklist_alt) AND (postal_address_id IS NULL) AND (NOT unpublished) AND (enrolment_status_id = ANY ('{E,T}'::text[])) AND (person_type = ANY ('{M,D,O}'::text[]))) -> Hash (cost=1703761.54..1703761.54 rows=568200 width=4) (actual time=34519.041..34519.041 rows=251911 loops=1) -> Hash Join (cost=793298.59..1703761.54 rows=568200 width=4) (actual time=8383.900..34414.612 rows=251911 loops=1) Hash Cond: (cm.corresp_master_id = c.corresp_master_id) -> Seq Scan on correspondence_master cm (cost=0.00..778644.91 rows=33636281 width=12) (actual time=0.045..9651.799 rows=33966957 loops=1) Filter: (person_id IS NOT NULL) -> Hash (cost=785006.92..785006.92 rows=663333 width=8) (actual time=8260.951..8260.951 rows=358582 loops=1) -> Hash Join (cost=233226.31..785006.92 rows=663333 width=8) (actual time=7042.396..8140.430 rows=358582 loops=1) Hash Cond: (c.generated_audit_id = gen_al.audit_id) -> Bitmap Heap Scan on correspondence c (cost=74599.32..527309.84 rows=4103876 width=16) (actual time=869.067..2474.081 rows=5297729 loops=1) Recheck Cond: (corresp_type_id = ANY ('{CL11,CL11A,CL12,CL15,CL15A,CL16,DM_1,DM_2}'::text[])) -> Bitmap Index Scan on corresp_type_fk (cost=0.00..73573.35 rows=4103876 width=0) (actual time=834.201..834.201 rows=5297729 loops=1) Index Cond: (corresp_type_id = ANY ('{CL11,CL11A,CL12,CL15,CL15A,CL16,DM_1,DM_2}'::text[])) -> Hash (cost=111084.39..111084.39 rows=2897808 width=8) (actual time=3373.619..3373.619 rows=2854079 loops=1) -> Index Scan using audit_log_audit_timestamp on audit_log gen_al (cost=0.00..111084.39 rows=2897808 width=8) (actual time=0.164..2322.894 rows=2854079 loops=1) Index Cond: (audit_timestamp > '2011-02-19 13:05:00'::timestamp without time zone) Total runtime: 39787.258 ms QUERY PLAN 8.4 ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- Hash Join (cost=914274.32..2577396.93 rows=3757429 width=0) (actual time=11425.505..62959.205 rows=220336 loops=1) Hash Cond: (c.generated_audit_id = gen_al.audit_id) -> Hash Join (cost=761247.58..2313627.39 rows=3757429 width=8) (actual time=8335.459..58598.883 rows=3087353 loops=1) Hash Cond: (cm.person_id = p.person_id) -> Hash Join (cost=604036.44..2031002.71 rows=4202314 width=12) (actual time=3850.257..49067.879 rows=3586540 loops=1) Hash Cond: (cm.corresp_master_id = c.corresp_master_id) -> Seq Scan on correspondence_master cm (cost=0.00..777996.60 rows=33964871 width=12) (actual time=0.024..10206.688 rows=33966957 loops=1) Filter: (person_id IS NOT NULL) -> Hash (cost=530987.51..530987.51 rows=4202314 width=16) (actual time=3848.577..3848.577 rows=5297729 loops=1) -> Bitmap Heap Scan on correspondence c (cost=76346.23..530987.51 rows=4202314 width=16) (actual time=673.660..2272.497 rows=5297729 loops=1) Recheck Cond: (corresp_type_id = ANY ('{CL11,CL11A,CL12,CL15,CL15A,CL16,DM_1,DM_2}'::text[])) -> Bitmap Index Scan on corresp_type_fk (cost=0.00..75295.65 rows=4202314 width=0) (actual time=640.301..640.301 rows=5297729 loops=1) Index Cond: (corresp_type_id = ANY ('{CL11,CL11A,CL12,CL15,CL15A,CL16,DM_1,DM_2}'::text[])) -> Hash (cost=115091.03..115091.03 rows=2567289 width=4) (actual time=4484.737..4484.737 rows=3101076 loops=1) -> Seq Scan on person p (cost=0.00..115091.03 rows=2567289 width=4) (actual time=0.013..3406.661 rows=3101076 loops=1) Filter: (active AND (NOT exclude_walklist_alt) AND (postal_address_id IS NULL) AND (NOT unpublished) AND (enrolment_status_id = ANY ('{E,T}'::text[])) AND (person_type = ANY ('{M,D,O}'::text[]))) -> Hash (cost=107100.99..107100.99 rows=2799260 width=8) (actual time=2962.039..2962.039 rows=2854070 loops=1) -> Index Scan using audit_log_audit_timestamp on audit_log gen_al (cost=0.00..107100.99 rows=2799260 width=8) (actual time=0.181..2055.802 rows=2854070 loops=1) Index Cond: (audit_timestamp > '2011-02-19 13:05:00'::timestamp without time zone) Total runtime: 63007.358 ms Cheers Mark
Mark Kirkwood <mark.kirkwood@catalyst.net.nz> writes: > I am in the progress of an 8.3 to 8.4 upgrade for a customer. I seem to > have stumbled upon what looks like a regression. The two databases > (8.3.14 and 8.4.8) have identical tuning parameters (where that makes > sense) and run on identical hardware. Both databases are regularly > vacuumed and analyzed (not by autovacuum), and performing an ANALYZE > does not change the plans shown below. Hmmm ... this is structurally a pretty simple query, so I'm surprised that 8.3 and 8.4 see it very much differently. The relation-level estimates and plan choices are very nearly the same; the only thing that's changed much is the estimates of the join sizes, and there were not that many changes in the join selectivity estimation for simple inner joins. I wonder whether you are seeing a bad side-effect of this patch: http://git.postgresql.org/gitweb/?p=postgresql.git&a=commitdiff&h=7f3eba30 That code would only be reached when one or both join columns lack MCV lists in pg_stats; if you had analyzed, the only reason for that to be the case is if the column is unique (or nearly so, in ANALYZE's opinion). regards, tom lane
On 24/08/11 15:15, Tom Lane wrote: > Mark Kirkwood<mark.kirkwood@catalyst.net.nz> writes: >> I am in the progress of an 8.3 to 8.4 upgrade for a customer. I seem to >> have stumbled upon what looks like a regression. The two databases >> (8.3.14 and 8.4.8) have identical tuning parameters (where that makes >> sense) and run on identical hardware. Both databases are regularly >> vacuumed and analyzed (not by autovacuum), and performing an ANALYZE >> does not change the plans shown below. > Hmmm ... this is structurally a pretty simple query, so I'm surprised > that 8.3 and 8.4 see it very much differently. The relation-level > estimates and plan choices are very nearly the same; the only thing > that's changed much is the estimates of the join sizes, and there were > not that many changes in the join selectivity estimation for simple > inner joins. I wonder whether you are seeing a bad side-effect of this > patch: > > http://git.postgresql.org/gitweb/?p=postgresql.git&a=commitdiff&h=7f3eba30 > > That code would only be reached when one or both join columns lack MCV > lists in pg_stats; if you had analyzed, the only reason for that to be > the case is if the column is unique (or nearly so, in ANALYZE's opinion). > Right that will be the case - audit_id is primary key for audit_log. Stats entries for the join columns look like: =# SELECT tablename ,attname,n_distinct,most_common_vals,most_common_freqs,histogram_bounds FROM pg_stats WHERE tablename IN ('correspondence','audit_log') AND attname IN ('audit_id','generated_audit_id'); -[ RECORD 1 ]-----+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- tablename | correspondence attname | generated_audit_id n_distinct | 4625 most_common_vals | {11983812,15865407,4865496,717803,842478,725709,7255002,2389608,4604147,9996442,8693810,4604145,5916872,2389606,3135764,3307895,10527855,7254994,8959356,9595632,6279892,9595640,2604937,5916870,6279950,1180586,2604768,1180638,11526036,4451499,5252795,6279919,6279955,8958886,2604929,6279904,7543722,8959031,2604804,7543823,8958930,8959226,1180650,2604871,3530205,6279960,11051216,11051224,3530140,7838365,15060203,1180309,1180423,3530177,7543749,7543790,8959026,8959083,12834024,1180447,1180632,1180664,2604779,2604901,2604943,6279944,6280027,7543820,8958992,8959011,3530107,6279923,7543085,15866296,1180470,1180473,2604846,2604874,2604892,6279977,6280046,7543496,8958904,8958914,1180281,1180497,2604801,2604973,3529965,6280051,7543654,7543667,7543815,2604840,2604852,2604877,6279947,6279991,6280016,6280095} most_common_freqs | {0.0787667,0.0769333,0.00906667,0.00886667,0.00826667,0.00593333,0.00326667,0.003,0.00293333,0.0027,0.00266667,0.00263333,0.00256667,0.0025,0.00246667,0.00203333,0.00203333,0.00196667,0.0019,0.00186667,0.00183333,0.0018,0.00173333,0.00173333,0.00173333,0.0017,0.0017,0.00166667,0.00166667,0.00163333,0.00163333,0.00163333,0.00163333,0.00163333,0.0016,0.0016,0.0016,0.0016,0.00156667,0.00156667,0.00156667,0.00156667,0.00153333,0.00153333,0.00153333,0.00153333,0.00153333,0.00153333,0.0015,0.0015,0.0015,0.00146667,0.00146667,0.00146667,0.00146667,0.00146667,0.00146667,0.00146667,0.00146667,0.00143333,0.00143333,0.00143333,0.00143333,0.00143333,0.00143333,0.00143333,0.00143333,0.00143333,0.00143333,0.00143333,0.0014,0.0014,0.0014,0.0014,0.00136667,0.00136667,0.00136667,0.00136667,0.00136667,0.00136667,0.00136667,0.00136667,0.00136667,0.00136667,0.00133333,0.00133333,0.00133333,0.00133333,0.00133333,0.00133333,0.00133333,0.00133333,0.00133333,0.0013,0.0013,0.0013,0.0013,0.0013,0.0013,0.0013} histogram_bounds | {-614,149124,436276,734992,1111802,1180324,1180449,1180481,1180507,1180610,1180640,1180656,1180672,1475625,1671884,1882852,2257454,2521497,2604750,2604785,2604821,2604857,2604895,2604923,2604957,2683740,3050195,3264561,3529673,3529821,3529894,3530041,3530072,3530093,3530125,3530151,3530181,3530216,3655474,3947599,4230064,4451407,4451648,4604143,4899541,5229325,5442183,5783894,6044973,6279792,6279830,6279872,6279934,6279988,6280024,6280057,6280087,6448106,6666623,6935161,7223774,7543005,7543220,7543548,7543678,7543706,7543733,7543763,7543785,7543831,7730234,8168222,8473126,8704950,8958785,8958894,8958920,8958946,8958981,8959021,8959054,8960124,8963427,9092223,9393810,9649295,9915513,10116459,10340456,10533434,10908764,11474630,12282455,13428124,14054953,14755339,15060207,15769093,16442810,17071416,17860068} -[ RECORD 2 ]-----+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- tablename | audit_log attname | audit_id n_distinct | -1 most_common_vals | most_common_freqs | histogram_bounds | {-899,172915,346206,520991,707646,900140,1090647,1274076,1455922,1631357,1802760,1992032,2160450,2341946,2514505,2670638,2851069,3031271,3190297,3359936,3536716,3706348,3899491,4067528,4232343,4405734,4574480,4753591,4930502,5122384,5287148,5460009,5657326,5824340,6020883,6214608,6409401,6606366,6779433,6945221,7123123,7294108,7495488,7649303,7816323,7997936,8191973,8362771,8526974,8733309,8911487,9099916,9289773,9472155,9661398,9825969,10004845,10176201,10351232,10527642,10680265,10853519,11040326,11229650,11422181,11605451,11806172,11985734,12171654,12364324,12559368,12729402,12912927,13073102,13287145,13455458,13649471,13826738,14004258,14187125,14356543,14539334,14715631,14895857,15060855,15231913,15404735,15577098,15742060,15901413,16088450,16270629,16458319,16650444,16826581,17003138,17158176,17315993,17497551,17687046,17879372} Cheers Mark
On 24/08/11 15:15, Tom Lane wrote: > > Hmmm ... this is structurally a pretty simple query, so I'm surprised > that 8.3 and 8.4 see it very much differently. The relation-level > estimates and plan choices are very nearly the same; the only thing > that's changed much is the estimates of the join sizes, and there were > not that many changes in the join selectivity estimation for simple > inner joins. I wonder whether you are seeing a bad side-effect of this > patch: > > http://git.postgresql.org/gitweb/?p=postgresql.git&a=commitdiff&h=7f3eba30 > Here is what the plan looks like with that patch reversed (it is back to 8.3 speed too). QUERY PLAN 8.4 - 7f3eba30 ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- Hash Join (cost=948567.18..1882454.51 rows=349702 width=0) (actual time=12320.037..38146.697 rows=217427 loops=1) Hash Cond: (cm.person_id = p.person_id) -> Hash Join (cost=791689.32..1702481.12 rows=581887 width=4) (actual time=7492.004..32727.783 rows=248441 loops=1) Hash Cond: (cm.corresp_master_id = c.corresp_master_id) -> Seq Scan on correspondence_master cm (cost=0.00..777460.25 rows=34003380 width=12) (actual time=0.016..8977.181 rows=33960209 loops=1) Filter: (person_id IS NOT NULL) -> Hash (cost=783297.43..783297.43 rows=671351 width=8) (actual time=7375.019..7375.019 rows=354456 loops=1) -> Hash Join (cost=231577.28..783297.43 rows=671351 width=8) (actual time=6374.538..7257.067 rows=354456 loops=1) Hash Cond: (c.generated_audit_id = gen_al.audit_id) -> Bitmap Heap Scan on correspondence c (cost=77121.49..532445.85 rows=4247118 width=16) (actual time=742.738..2790.225 rows=5293603 loops=1) Recheck Cond: (corresp_type_id = ANY ('{CL11,CL11A,CL12,CL15,CL15A,CL16,DM_1,DM_2}'::text[])) -> Bitmap Index Scan on corresp_type_fk (cost=0.00..76059.71 rows=4247118 width=0) (actual time=708.164..708.164 rows=5293603 loops=1) Index Cond: (corresp_type_id = ANY ('{CL11,CL11A,CL12,CL15,CL15A,CL16,DM_1,DM_2}'::text[])) -> Hash (cost=108073.47..108073.47 rows=2827066 width=8) (actual time=2759.145..2759.145 rows=2819891 loops=1) -> Index Scan using audit_log_audit_timestamp on audit_log gen_al (cost=0.00..108073.47 rows=2827066 width=8) (actual time=0.085..1800.175 rows=2819891 loops=1) Index Cond: (audit_timestamp > '2011-02-19 13:05:00'::timestamp without time zone) -> Hash (cost=115044.00..115044.00 rows=2549829 width=4) (actual time=4827.310..4827.310 rows=3101177 loops=1) -> Seq Scan on person p (cost=0.00..115044.00 rows=2549829 width=4) (actual time=0.061..3600.767 rows=3101177 loops=1) Filter: (active AND (NOT exclude_walklist_alt) AND (postal_address_id IS NULL) AND (NOT unpublished) AND (enrolment_status_id = ANY ('{E,T}'::text[])) AND (person_type = ANY ('{M,D,O}'::text[]))) Total runtime: 38171.865 ms Cheers Mark
On 24/08/11 17:22, Mark Kirkwood wrote: > On 24/08/11 15:15, Tom Lane wrote: >> >> Hmmm ... this is structurally a pretty simple query, so I'm surprised >> that 8.3 and 8.4 see it very much differently. The relation-level >> estimates and plan choices are very nearly the same; the only thing >> that's changed much is the estimates of the join sizes, and there were >> not that many changes in the join selectivity estimation for simple >> inner joins. I wonder whether you are seeing a bad side-effect of this >> patch: >> >> http://git.postgresql.org/gitweb/?p=postgresql.git&a=commitdiff&h=7f3eba30 >> >> > > Here is what the plan looks like with that patch reversed (it is back > to 8.3 speed too). > > QUERY PLAN 8.4 - 7f3eba30 (better plan snipped) > > I note from the commit message that the fix test case was from Grzegorz Jaskiewicz (antijoin against a small subset of a relation). I was not able to find this in the archives - Grzegorz do you recall the actual test case? I thought it might be useful for me to spend some time studying both cases and seeing if I can come up with any tweaks that would let both your and my queries work well! Cheers Mark
2011/8/29 Mark Kirkwood <mark.kirkwood@catalyst.net.nz>: > I note from the commit message that the fix test case was from Grzegorz > Jaskiewicz (antijoin against a small subset of a relation). I was not able > to find this in the archives - Grzegorz do you recall the actual test case? > I thought it might be useful for me to spend some time studying both cases > and seeing if I can come up with any tweaks that would let both your and my > queries work well! Sorry, I don't remember that particular example. If I complained about it, it would have been on this list or the general list. I'll have a look by date. -- GJ
On 30/08/11 21:43, Grzegorz Jaśkiewicz wrote: > 2011/8/29 Mark Kirkwood<mark.kirkwood@catalyst.net.nz>: > >> I note from the commit message that the fix test case was from Grzegorz >> Jaskiewicz (antijoin against a small subset of a relation). I was not able >> to find this in the archives - Grzegorz do you recall the actual test case? >> I thought it might be useful for me to spend some time studying both cases >> and seeing if I can come up with any tweaks that would let both your and my >> queries work well! > Sorry, I don't remember that particular example. If I complained about > it, it would have been on this list or the general list. > I'll have a look by date. > Thanks - however I think I have managed to make up a good test case that shows the particular commit working. More on that to come! Cheers Mark
On 24/08/11 15:15, Tom Lane wrote: > Mark Kirkwood<mark.kirkwood@catalyst.net.nz> writes: >> I am in the progress of an 8.3 to 8.4 upgrade for a customer. I seem to >> have stumbled upon what looks like a regression. The two databases >> (8.3.14 and 8.4.8) have identical tuning parameters (where that makes >> sense) and run on identical hardware. Both databases are regularly >> vacuumed and analyzed (not by autovacuum), and performing an ANALYZE >> does not change the plans shown below. > Hmmm ... this is structurally a pretty simple query, so I'm surprised > that 8.3 and 8.4 see it very much differently. The relation-level > estimates and plan choices are very nearly the same; the only thing > that's changed much is the estimates of the join sizes, and there were > not that many changes in the join selectivity estimation for simple > inner joins. I wonder whether you are seeing a bad side-effect of this > patch: > > http://git.postgresql.org/gitweb/?p=postgresql.git&a=commitdiff&h=7f3eba30 > > That code would only be reached when one or both join columns lack MCV > lists in pg_stats; if you had analyzed, the only reason for that to be > the case is if the column is unique (or nearly so, in ANALYZE's opinion). > > I've come up with (hopefully) a good set of semi, anti and regular joins to demonstrate the effect of this commit. I've attached them, and the schema generator (I believe I've used this before for optimization examples...). Also I've tried out an experimental patch to make joins like the one I'm having trouble with *and* also the anti joins the commit was for - get better row estimates. So firstly consider an anti join (these are run against git HEAD rather than 8.4): EXPLAIN ANALYZE SELECT 1 FROM nodekeyword nk WHERE nk.keywordid < 100000 AND NOT EXISTS (SELECT 1 FROM node n WHERE n.nodeid = nk.nodeid AND n.updated > '2011-12-01'::timestamp ); With commit: QUERY PLAN ----------------------------------------------------------------------------------------------------------------------------------- Hash Anti Join (cost=426079.34..765699.66 rows=1599293 width=0) (actual time=29907.716..47255.825 rows=1839193 loops=1) Hash Cond: (nk.nodeid = n.nodeid) -> Seq Scan on nodekeyword nk (cost=0.00..297414.03 rows=1991560 width=4) (actual time=5.373..11838.738 rows=1999780 loops=1) Filter: (keywordid < 100000) -> Hash (cost=419643.00..419643.00 rows=392267 width=4) (actual time=29883.980..29883.980 rows=401678 loops=1) Buckets: 4096 Batches: 16 Memory Usage: 891kB -> Seq Scan on node n (cost=0.00..419643.00 rows=392267 width=4) (actual time=0.339..29295.764 rows=401678 loops=1) Filter: (updated > '2011-12-01 00:00:00'::timestamp without time zone) Without commit: QUERY PLAN ----------------------------------------------------------------------------------------------------------------------------------- Hash Anti Join (cost=426079.34..760501.96 rows=1 width=0) (actual time=30409.336..47919.613 rows=1839193 loops=1) Hash Cond: (nk.nodeid = n.nodeid) -> Seq Scan on nodekeyword nk (cost=0.00..297414.03 rows=1991560 width=4) (actual time=5.359..12081.372 rows=1999780 loops=1) Filter: (keywordid < 100000) -> Hash (cost=419643.00..419643.00 rows=392267 width=4) (actual time=30392.235..30392.235 rows=401678 loops=1) Buckets: 4096 Batches: 16 Memory Usage: 891kB -> Seq Scan on node n (cost=0.00..419643.00 rows=392267 width=4) (actual time=0.384..29806.407 rows=401678 loops=1) Filter: (updated > '2011-12-01 00:00:00'::timestamp without time zone) Note the rows estimate for the anti join is hopelessly wrong, so clearly the commitdoes the job here (I think this models the test case for said commit)! Now some joins: EXPLAIN ANALYZE SELECT 1 FROM NODE n JOIN nodekeyword nk ON (n.nodeid = nk.nodeid) WHERE n.updated > '2011-01-01'::timestamp AND nk.keywordid < 100000; With commit: QUERY PLAN ----------------------------------------------------------------------------------------------------------------------------------- Hash Join (cost=501666.88..871512.65 rows=1991560 width=0) (actual time=30032.836..53073.731 rows=1993866 loops=1) Hash Cond: (nk.nodeid = n.nodeid) -> Seq Scan on nodekeyword nk (cost=0.00..297414.03 rows=1991560 width=4) (actual time=5.327..14393.629 rows=1999780 loops=1) Filter: (keywordid < 100000) -> Hash (cost=419643.00..419643.00 rows=4999510 width=4) (actual time=30017.777..30017.777 rows=4985269 loops=1) Buckets: 4096 Batches: 256 Memory Usage: 699kB -> Seq Scan on node n (cost=0.00..419643.00 rows=4999510 width=4) (actual time=0.005..23272.287 rows=4985269 loops=1) Filter: (updated > '2011-01-01 00:00:00'::timestamp without time zone) Without commit: QUERY PLAN ----------------------------------------------------------------------------------------------------------------------------------- Hash Join (cost=501666.88..871510.70 rows=1991365 width=0) (actual time=30549.498..54852.399 rows=1993866 loops=1) Hash Cond: (nk.nodeid = n.nodeid) -> Seq Scan on nodekeyword nk (cost=0.00..297414.03 rows=1991560 width=4) (actual time=5.331..13760.417 rows=1999780 loops=1) Filter: (keywordid < 100000) -> Hash (cost=419643.00..419643.00 rows=4999510 width=4) (actual time=30534.464..30534.464 rows=4985269 loops=1) Buckets: 4096 Batches: 256 Memory Usage: 699kB -> Seq Scan on node n (cost=0.00..419643.00 rows=4999510 width=4) (actual time=0.005..23696.167 rows=4985269 loops=1) Filter: (updated > '2011-01-01 00:00:00'::timestamp without time zone) Another join: EXPLAIN ANALYZE SELECT 1 FROM NODE n JOIN nodekeyword nk ON (n.nodeid = nk.nodeid) WHERE n.updated > '2011-12-01'::timestamp AND nk.keywordid < 100000; With commit: QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------------ Hash Join (cost=426079.34..764424.63 rows=392267 width=0) (actual time=29295.966..45578.876 rows=160587 loops=1) Hash Cond: (nk.nodeid = n.nodeid) -> Seq Scan on nodekeyword nk (cost=0.00..297414.03 rows=1991560 width=4) (actual time=12.452..12367.760 rows=1999780 loops=1) Filter: (keywordid < 100000) -> Hash (cost=419643.00..419643.00 rows=392267 width=4) (actual time=29273.571..29273.571 rows=401678 loops=1) Buckets: 4096 Batches: 16 Memory Usage: 891kB -> Seq Scan on node n (cost=0.00..419643.00 rows=392267 width=4) (actual time=10.899..28678.818 rows=401678 loops=1) Filter: (updated > '2011-12-01 00:00:00'::timestamp without time zone) Without commit: QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------------ Hash Join (cost=426079.34..762064.41 rows=156245 width=0) (actual time=29179.313..44605.243 rows=160587 loops=1) Hash Cond: (nk.nodeid = n.nodeid) -> Seq Scan on nodekeyword nk (cost=0.00..297414.03 rows=1991560 width=4) (actual time=12.486..11546.469 rows=1999780 loops=1) Filter: (keywordid < 100000) -> Hash (cost=419643.00..419643.00 rows=392267 width=4) (actual time=29156.889..29156.889 rows=401678 loops=1) Buckets: 4096 Batches: 16 Memory Usage: 891kB -> Seq Scan on node n (cost=0.00..419643.00 rows=392267 width=4) (actual time=10.915..28545.553 rows=401678 loops=1) Filter: (updated > '2011-12-01 00:00:00'::timestamp without time zone) So in the case where we filer out a large percentage of the rows the commit inflates the estimates...consider a more extreme example: EXPLAIN ANALYZE SELECT 1 FROM NODE n JOIN nodekeyword nk ON (n.nodeid = nk.nodeid) WHERE n.updated > '2011-12-27'::timestamp AND nk.keywordid < 10000; With commit: QUERY PLAN ----------------------------------------------------------------------------------------------------------------------------------------- Nested Loop (cost=0.00..569488.45 rows=16344 width=0) (actual time=55.452..65341.135 rows=604 loops=1) -> Seq Scan on node n (cost=0.00..419643.00 rows=16344 width=4) (actual time=13.537..46138.214 rows=14952 loops=1) Filter: (updated > '2011-12-27 00:00:00'::timestamp without time zone) -> Index Scan using nodekeyword_pk on nodekeyword nk (cost=0.00..9.16 rows=1 width=4) (actual time=1.277..1.279 rows=0 loops=14952) Index Cond: ((nodeid = n.nodeid) AND (keywordid < 10000)) Without commit: QUERY PLAN ----------------------------------------------------------------------------------------------------------------------------------------- Nested Loop (cost=0.00..569488.45 rows=631 width=0) (actual time=43.969..64988.036 rows=604 loops=1) -> Seq Scan on node n (cost=0.00..419643.00 rows=16344 width=4) (actual time=2.060..46065.879 rows=14952 loops=1) Filter: (updated > '2011-12-27 00:00:00'::timestamp without time zone) -> Index Scan using nodekeyword_pk on nodekeyword nk (cost=0.00..9.16 rows=1 width=4) (actual time=1.259..1.260 rows=0 loops=14952) Index Cond: ((nodeid = n.nodeid) AND (keywordid < 10000)) So clearly this commit is not so good for this type of join (this models the case I posted initially). Now four semi joins: EXPLAIN ANALYZE SELECT 1 FROM nodekeyword nk WHERE nk.keywordid < 100000 AND EXISTS (SELECT 1 FROM node n WHERE n.nodeid = nk.nodeid AND n.updated > '2011-12-01'::timestamp ); With commit: QUERY PLAN ----------------------------------------------------------------------------------------------------------------------------------- Hash Semi Join (cost=426079.34..753629.40 rows=392267 width=0) (actual time=28405.965..43724.471 rows=160587 loops=1) Hash Cond: (nk.nodeid = n.nodeid) -> Seq Scan on nodekeyword nk (cost=0.00..297414.03 rows=1991560 width=4) (actual time=5.767..11561.340 rows=1999780 loops=1) Filter: (keywordid < 100000) -> Hash (cost=419643.00..419643.00 rows=392267 width=4) (actual time=28391.293..28391.293 rows=401678 loops=1) Buckets: 4096 Batches: 16 Memory Usage: 891kB -> Seq Scan on node n (cost=0.00..419643.00 rows=392267 width=4) (actual time=0.038..27820.097 rows=401678 loops=1) Filter: (updated > '2011-12-01 00:00:00'::timestamp without time zone) Without commit: QUERY PLAN ----------------------------------------------------------------------------------------------------------------------------------- Hash Semi Join (cost=426079.34..780417.56 rows=1991560 width=0) (actual time=29447.638..44738.280 rows=160587 loops=1) Hash Cond: (nk.nodeid = n.nodeid) -> Seq Scan on nodekeyword nk (cost=0.00..297414.03 rows=1991560 width=4) (actual time=5.771..11501.350 rows=1999780 loops=1) Filter: (keywordid < 100000) -> Hash (cost=419643.00..419643.00 rows=392267 width=4) (actual time=29433.952..29433.952 rows=401678 loops=1) Buckets: 4096 Batches: 16 Memory Usage: 891kB -> Seq Scan on node n (cost=0.00..419643.00 rows=392267 width=4) (actual time=0.040..28850.800 rows=401678 loops=1) Filter: (updated > '2011-12-01 00:00:00'::timestamp without time zone) Clearly row estimation is hopelessly broken *without* this commit here. Another semi join: EXPLAIN ANALYZE SELECT 1 FROM nodekeyword nk WHERE nk.keywordid < 100000 AND EXISTS (SELECT 1 FROM node n WHERE n.nodeid = nk.nodeid AND n.updated > '2011-01-01'::timestamp ); With commit: QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------------ Hash Semi Join (cost=501666.88..871512.65 rows=1991560 width=0) (actual time=29048.154..51230.453 rows=1993866 loops=1) Hash Cond: (nk.nodeid = n.nodeid) -> Seq Scan on nodekeyword nk (cost=0.00..297414.03 rows=1991560 width=4) (actual time=12.423..13430.618 rows=1999780 loops=1) Filter: (keywordid < 100000) -> Hash (cost=419643.00..419643.00 rows=4999510 width=4) (actual time=29024.442..29024.442 rows=4985269 loops=1) Buckets: 4096 Batches: 256 Memory Usage: 699kB -> Seq Scan on node n (cost=0.00..419643.00 rows=4999510 width=4) (actual time=0.010..22384.904 rows=4985269 loops=1) Filter: (updated > '2011-01-01 00:00:00'::timestamp without time zone) Without commit: QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------------ Hash Semi Join (cost=501666.88..871512.65 rows=1991560 width=0) (actual time=28914.970..51162.918 rows=1993866 loops=1) Hash Cond: (nk.nodeid = n.nodeid) -> Seq Scan on nodekeyword nk (cost=0.00..297414.03 rows=1991560 width=4) (actual time=12.504..13780.506 rows=1999780 loops=1) Filter: (keywordid < 100000) -> Hash (cost=419643.00..419643.00 rows=4999510 width=4) (actual time=28891.705..28891.705 rows=4985269 loops=1) Buckets: 4096 Batches: 256 Memory Usage: 699kB -> Seq Scan on node n (cost=0.00..419643.00 rows=4999510 width=4) (actual time=0.008..22082.459 rows=4985269 loops=1) Filter: (updated > '2011-01-01 00:00:00'::timestamp without time zone) Another semi join: EXPLAIN ANALYZE SELECT 1 FROM nodekeyword nk WHERE nk.keywordid < 10000 AND EXISTS (SELECT 1 FROM node n WHERE n.nodeid = nk.nodeid AND n.updated > '2011-01-01'::timestamp ); With commit: QUERY PLAN ---------------------------------------------------------------------------------------------------------------------------------- Hash Semi Join (cost=501666.88..823736.17 rows=192921 width=0) (actual time=30120.347..49646.175 rows=199050 loops=1) Hash Cond: (nk.nodeid = n.nodeid) -> Seq Scan on nodekeyword nk (cost=0.00..297414.03 rows=192921 width=4) (actual time=12.359..16335.889 rows=199616 loops=1) Filter: (keywordid < 10000) -> Hash (cost=419643.00..419643.00 rows=4999510 width=4) (actual time=30072.444..30072.444 rows=4985269 loops=1) Buckets: 4096 Batches: 256 Memory Usage: 699kB -> Seq Scan on node n (cost=0.00..419643.00 rows=4999510 width=4) (actual time=0.009..23409.799 rows=4985269 loops=1) Filter: (updated > '2011-01-01 00:00:00'::timestamp without time zone) Without commit: QUERY PLAN ---------------------------------------------------------------------------------------------------------------------------------- Hash Semi Join (cost=501666.88..823736.17 rows=192921 width=0) (actual time=29395.513..48857.600 rows=199050 loops=1) Hash Cond: (nk.nodeid = n.nodeid) -> Seq Scan on nodekeyword nk (cost=0.00..297414.03 rows=192921 width=4) (actual time=12.528..16261.983 rows=199616 loops=1) Filter: (keywordid < 10000) -> Hash (cost=419643.00..419643.00 rows=4999510 width=4) (actual time=29348.826..29348.826 rows=4985269 loops=1) Buckets: 4096 Batches: 256 Memory Usage: 699kB -> Seq Scan on node n (cost=0.00..419643.00 rows=4999510 width=4) (actual time=0.009..22505.930 rows=4985269 loops=1) Filter: (updated > '2011-01-01 00:00:00'::timestamp without time zone) Final semi join: EXPLAIN ANALYZE SELECT 1 FROM nodekeyword nk WHERE nk.keywordid < 10000 AND EXISTS (SELECT 1 FROM node n WHERE n.nodeid = nk.nodeid AND n.updated > '2011-12-01'::timestamp ); With commit: QUERY PLAN ---------------------------------------------------------------------------------------------------------------------------------- Hash Semi Join (cost=426079.34..730392.78 rows=192921 width=0) (actual time=29060.665..44713.615 rows=16003 loops=1) Hash Cond: (nk.nodeid = n.nodeid) -> Seq Scan on nodekeyword nk (cost=0.00..297414.03 rows=192921 width=4) (actual time=12.366..15064.457 rows=199616 loops=1) Filter: (keywordid < 10000) -> Hash (cost=419643.00..419643.00 rows=392267 width=4) (actual time=29026.017..29026.017 rows=401678 loops=1) Buckets: 4096 Batches: 16 Memory Usage: 891kB -> Seq Scan on node n (cost=0.00..419643.00 rows=392267 width=4) (actual time=0.039..28441.039 rows=401678 loops=1) Filter: (updated > '2011-12-01 00:00:00'::timestamp without time zone) Without commit: QUERY PLAN ---------------------------------------------------------------------------------------------------------------------------------- Hash Semi Join (cost=426079.34..730392.78 rows=192921 width=0) (actual time=28969.107..43725.339 rows=16003 loops=1) Hash Cond: (nk.nodeid = n.nodeid) -> Seq Scan on nodekeyword nk (cost=0.00..297414.03 rows=192921 width=4) (actual time=12.486..14198.613 rows=199616 loops=1) Filter: (keywordid < 10000) -> Hash (cost=419643.00..419643.00 rows=392267 width=4) (actual time=28935.248..28935.248 rows=401678 loops=1) Buckets: 4096 Batches: 16 Memory Usage: 891kB -> Seq Scan on node n (cost=0.00..419643.00 rows=392267 width=4) (actual time=0.047..28343.005 rows=401678 loops=1) Filter: (updated > '2011-12-01 00:00:00'::timestamp without time zone) Well this guy is too sneaky for either case :-( We seem to need a patch variant that *only* clamps the estimates in the anti or semi join case, e.g (note against git HEAD): diff --git a/src/backend/utils/adt/selfuncs.c b/src/backend/utils/adt/selfuncs.c index e065826..bf5002f 100644 --- a/src/backend/utils/adt/selfuncs.c +++ b/src/backend/utils/adt/selfuncs.c @@ -2257,11 +2257,6 @@ eqjoinsel_inner(Oid operator, double nullfrac1 = stats1 ? stats1->stanullfrac : 0.0; double nullfrac2 = stats2 ? stats2->stanullfrac : 0.0; - if (vardata1->rel) - nd1 = Min(nd1, vardata1->rel->rows); - if (vardata2->rel) - nd2 = Min(nd2, vardata2->rel->rows); - selec = (1.0 - nullfrac1) * (1.0 - nullfrac2); if (nd1 > nd2) selec /= nd1; Now run all the queries, 1st the anti join: QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------------ Hash Anti Join (cost=426079.34..765699.66 rows=1599293 width=0) (actual time=30121.008..48503.453 rows=1839193 loops=1) Hash Cond: (nk.nodeid = n.nodeid) -> Seq Scan on nodekeyword nk (cost=0.00..297414.03 rows=1991560 width=4) (actual time=12.623..12853.522 rows=1999780 loops=1) Filter: (keywordid < 100000) -> Hash (cost=419643.00..419643.00 rows=392267 width=4) (actual time=30108.058..30108.058 rows=401678 loops=1) Buckets: 4096 Batches: 16 Memory Usage: 891kB -> Seq Scan on node n (cost=0.00..419643.00 rows=392267 width=4) (actual time=0.347..29508.393 rows=401678 loops=1) Filter: (updated > '2011-12-01 00:00:00'::timestamp without time zone) And the 3 joins: QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------------ Hash Join (cost=501666.88..871510.70 rows=1991365 width=0) (actual time=30148.073..52370.308 rows=1993866 loops=1) Hash Cond: (nk.nodeid = n.nodeid) -> Seq Scan on nodekeyword nk (cost=0.00..297414.03 rows=1991560 width=4) (actual time=12.291..13300.233 rows=1999780 loops=1) Filter: (keywordid < 100000) -> Hash (cost=419643.00..419643.00 rows=4999510 width=4) (actual time=30124.453..30124.453 rows=4985269 loops=1) Buckets: 4096 Batches: 256 Memory Usage: 699kB -> Seq Scan on node n (cost=0.00..419643.00 rows=4999510 width=4) (actual time=0.009..23334.774 rows=4985269 loops=1) Filter: (updated > '2011-01-01 00:00:00'::timestamp without time zone) QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------------ Hash Join (cost=426079.34..762064.41 rows=156245 width=0) (actual time=29954.251..46014.379 rows=160587 loops=1) Hash Cond: (nk.nodeid = n.nodeid) -> Seq Scan on nodekeyword nk (cost=0.00..297414.03 rows=1991560 width=4) (actual time=12.420..12126.142 rows=1999780 loops=1) Filter: (keywordid < 100000) -> Hash (cost=419643.00..419643.00 rows=392267 width=4) (actual time=29936.578..29936.578 rows=401678 loops=1) Buckets: 4096 Batches: 16 Memory Usage: 891kB -> Seq Scan on node n (cost=0.00..419643.00 rows=392267 width=4) (actual time=10.934..29357.789 rows=401678 loops=1) Filter: (updated > '2011-12-01 00:00:00'::timestamp without time zone) QUERY PLAN ----------------------------------------------------------------------------------------------------------------------------------------- Nested Loop (cost=0.00..569488.45 rows=631 width=0) (actual time=44.065..67179.686 rows=604 loops=1) -> Seq Scan on node n (cost=0.00..419643.00 rows=16344 width=4) (actual time=2.165..48523.075 rows=14952 loops=1) Filter: (updated > '2011-12-27 00:00:00'::timestamp without time zone) -> Index Scan using nodekeyword_pk on nodekeyword nk (cost=0.00..9.16 rows=1 width=4) (actual time=1.241..1.242 rows=0 loops=14952) Index Cond: ((nodeid = n.nodeid) AND (keywordid < 10000)) And the 4 semi joins... QUERY PLAN ----------------------------------------------------------------------------------------------------------------------------------- Hash Semi Join (cost=426079.34..753629.40 rows=392267 width=0) (actual time=29355.949..45220.958 rows=160587 loops=1) Hash Cond: (nk.nodeid = n.nodeid) -> Seq Scan on nodekeyword nk (cost=0.00..297414.03 rows=1991560 width=4) (actual time=5.731..11983.132 rows=1999780 loops=1) Filter: (keywordid < 100000) -> Hash (cost=419643.00..419643.00 rows=392267 width=4) (actual time=29342.387..29342.387 rows=401678 loops=1) Buckets: 4096 Batches: 16 Memory Usage: 891kB -> Seq Scan on node n (cost=0.00..419643.00 rows=392267 width=4) (actual time=0.039..28763.514 rows=401678 loops=1) Filter: (updated > '2011-12-01 00:00:00'::timestamp without time zone) QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------------ Hash Semi Join (cost=501666.88..871512.65 rows=1991560 width=0) (actual time=30823.334..53136.910 rows=1993866 loops=1) Hash Cond: (nk.nodeid = n.nodeid) -> Seq Scan on nodekeyword nk (cost=0.00..297414.03 rows=1991560 width=4) (actual time=12.555..13881.366 rows=1999780 loops=1) Filter: (keywordid < 100000) -> Hash (cost=419643.00..419643.00 rows=4999510 width=4) (actual time=30800.017..30800.017 rows=4985269 loops=1) Buckets: 4096 Batches: 256 Memory Usage: 699kB -> Seq Scan on node n (cost=0.00..419643.00 rows=4999510 width=4) (actual time=0.010..24028.932 rows=4985269 loops=1) Filter: (updated > '2011-01-01 00:00:00'::timestamp without time zone) QUERY PLAN ---------------------------------------------------------------------------------------------------------------------------------- Hash Semi Join (cost=501666.88..823736.17 rows=192921 width=0) (actual time=29278.861..48346.647 rows=199050 loops=1) Hash Cond: (nk.nodeid = n.nodeid) -> Seq Scan on nodekeyword nk (cost=0.00..297414.03 rows=192921 width=4) (actual time=12.523..15809.390 rows=199616 loops=1) Filter: (keywordid < 10000) -> Hash (cost=419643.00..419643.00 rows=4999510 width=4) (actual time=29232.161..29232.161 rows=4985269 loops=1) Buckets: 4096 Batches: 256 Memory Usage: 699kB -> Seq Scan on node n (cost=0.00..419643.00 rows=4999510 width=4) (actual time=0.009..22541.899 rows=4985269 loops=1) Filter: (updated > '2011-01-01 00:00:00'::timestamp without time zone) QUERY PLAN ---------------------------------------------------------------------------------------------------------------------------------- Hash Semi Join (cost=426079.34..730392.78 rows=192921 width=0) (actual time=28594.210..42976.001 rows=16003 loops=1) Hash Cond: (nk.nodeid = n.nodeid) -> Seq Scan on nodekeyword nk (cost=0.00..297414.03 rows=192921 width=4) (actual time=12.581..13810.924 rows=199616 loops=1) Filter: (keywordid < 10000) -> Hash (cost=419643.00..419643.00 rows=392267 width=4) (actual time=28560.258..28560.258 rows=401678 loops=1) Buckets: 4096 Batches: 16 Memory Usage: 891kB -> Seq Scan on node n (cost=0.00..419643.00 rows=392267 width=4) (actual time=0.048..27983.235 rows=401678 loops=1) Filter: (updated > '2011-12-01 00:00:00'::timestamp without time zone) (this last one was wildly inaccurate pre patching) So this looks quite encouraging (unless I have overlooked a set of queries that now perform worse - which could be the case), thoughts? regards Mark
Attachment
Mark Kirkwood <mark.kirkwood@catalyst.net.nz> writes: > [ assorted examples showing that commit > 7f3eba30c9d622d1981b1368f2d79ba0999cdff2 has got problems ] Thanks for the test cases. After playing with these for a bit I believe I've figured out the error in my previous thinking. Clamping the ndistinct value like that can improve matters when applied to the inside relation of a semi or anti join, but in all other cases it's just wrong. If you think about what is happening in eqjoinsel_inner with the patch, we are reducing the ndistinct estimate for the join key column proportionally to the selectivity of whatever baserel restrictions apply. This then results in proportionally increasing the selectivity number for the join condition --- in other words, we're more or less cancelling out the effects of one or the other relation's base restrictions. So that's pretty broken in general. The reason it is important for semi/antijoin inner relations is that this is actually the only way that restrictions applied to the inner rel get to impact the join size estimate at all, since set_joinrel_size_estimates is not going to factor the inner rel size into what it multiplies the join selectivity against. In short, I was mistakenly extrapolating from the observation that it helped to hack the ndistinct estimate for a semijoin's inner rel, to the conclusion that we should do that for all join input rels. So, not only are you correct that we should revert the changes to eqjoinsel_inner, but what's happening in eqjoinsel_semi is wrong too. It should only be clamping the ndistinct value for the inner side. And I think it needs to be taking that into account for the case where it does have MCVs as well as the case where it doesn't. So I'll go back to this with hopefully a clearer picture of what's happening. Thanks again for the test cases. regards, tom lane
I wrote: > Mark Kirkwood <mark.kirkwood@catalyst.net.nz> writes: >> [ assorted examples showing that commit >> 7f3eba30c9d622d1981b1368f2d79ba0999cdff2 has got problems ] > ... > So, not only are you correct that we should revert the changes to > eqjoinsel_inner, but what's happening in eqjoinsel_semi is wrong too. I've retested these examples with the patches I committed yesterday. Six of the eight examples are estimated pretty nearly dead on, while the other two are estimated about 50% too high (still a lot better than before). AFAICT there's no easy way to improve those estimates further; eqjoinsel_semi just plain hasn't got enough information to know how many matches there will be. regards, tom lane
On 02/09/11 11:13, Tom Lane wrote: > I wrote: >> Mark Kirkwood<mark.kirkwood@catalyst.net.nz> writes: >>> [ assorted examples showing that commit >>> 7f3eba30c9d622d1981b1368f2d79ba0999cdff2 has got problems ] >> ... >> So, not only are you correct that we should revert the changes to >> eqjoinsel_inner, but what's happening in eqjoinsel_semi is wrong too. > I've retested these examples with the patches I committed yesterday. > Six of the eight examples are estimated pretty nearly dead on, while the > other two are estimated about 50% too high (still a lot better than > before). AFAICT there's no easy way to improve those estimates further; > eqjoinsel_semi just plain hasn't got enough information to know how many > matches there will be. > > Just noticed your two commits this morning and ran them through the examples too - results look really good! Not only are the plain join queries looking way better but that last semi join that was way off is now being estimated pretty close. Should be interesting to see how much this improves more complex queries! Cheers Mark
On 02/09/11 11:18, Mark Kirkwood wrote: > On 02/09/11 11:13, Tom Lane wrote: >> I wrote: >>> Mark Kirkwood<mark.kirkwood@catalyst.net.nz> writes: >>>> [ assorted examples showing that commit >>>> 7f3eba30c9d622d1981b1368f2d79ba0999cdff2 has got problems ] >>> ... >>> So, not only are you correct that we should revert the changes to >>> eqjoinsel_inner, but what's happening in eqjoinsel_semi is wrong too. >> I've retested these examples with the patches I committed yesterday. >> Six of the eight examples are estimated pretty nearly dead on, while the >> other two are estimated about 50% too high (still a lot better than >> before). AFAICT there's no easy way to improve those estimates further; >> eqjoinsel_semi just plain hasn't got enough information to know how many >> matches there will be. >> >> > > Just noticed your two commits this morning and ran them through the > examples too - results look really good! Not only are the plain join > queries looking way better but that last semi join that was way off is > now being estimated pretty close. Should be interesting to see how > much this improves more complex queries! > > While this is still fresh in your mind, a couple of additional anti join queries are still managing to sneak past estimation: EXPLAIN ANALYZE SELECT 1 FROM nodekeyword nk WHERE nk.keywordid < 100000 AND NOT EXISTS (SELECT 1 FROM node n WHERE n.nodeid = nk.nodeid AND n.updated > '2011-01-01'::timestamp ); QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------------ Hash Anti Join (cost=501666.88..851597.05 rows=1 width=0) (actual time=29956.971..50933.702 rows=5914 loops=1) Hash Cond: (nk.nodeid = n.nodeid) -> Seq Scan on nodekeyword nk (cost=0.00..297414.03 rows=1991560 width=4) (actual time=13.352..13765.749 rows=1999780 loops=1) Filter: (keywordid < 100000) -> Hash (cost=419643.00..419643.00 rows=4999510 width=4) (actual time=29345.238..29345.238 rows=4985269 loops=1) Buckets: 4096 Batches: 256 Memory Usage: 699kB -> Seq Scan on node n (cost=0.00..419643.00 rows=4999510 width=4) (actual time=0.010..22731.316 rows=4985269 loops=1) Filter: (updated > '2011-01-01 00:00:00'::timestamp without time zone) EXPLAIN ANALYZE SELECT 1 FROM nodekeyword nk WHERE nk.keywordid < 10000 AND NOT EXISTS (SELECT 1 FROM node n WHERE n.nodeid = nk.nodeid AND n.updated > '2011-01-01'::timestamp ); QUERY PLAN ---------------------------------------------------------------------------------------------------------------------------------- Hash Anti Join (cost=501666.88..821806.96 rows=1 width=0) (actual time=46497.231..49196.057 rows=566 loops=1) Hash Cond: (nk.nodeid = n.nodeid) -> Seq Scan on nodekeyword nk (cost=0.00..297414.03 rows=192921 width=4) (actual time=19.916..16250.224 rows=199616 loops=1) Filter: (keywordid < 10000) -> Hash (cost=419643.00..419643.00 rows=4999510 width=4) (actual time=29901.178..29901.178 rows=4985269 loops=1) Buckets: 4096 Batches: 256 Memory Usage: 699kB -> Seq Scan on node n (cost=0.00..419643.00 rows=4999510 width=4) (actual time=0.008..23207.964 rows=4985269 loops=1) Filter: (updated > '2011-01-01 00:00:00'::timestamp without time zone)
Mark Kirkwood <mark.kirkwood@catalyst.net.nz> writes: > While this is still fresh in your mind, a couple of additional anti join > queries are still managing to sneak past estimation: Yeah, those are estimating that all the outer rows have join partners, because there are more distinct values in the sub-select than there are in the outer relation. AFAICS there are not any errors in the statistics, it's just that the estimation rule falls down here. If you've heard of a better estimator for semijoin/antijoin selectivity, I'm all ears. The best idea I have at the moment is to put an arbitrary upper limit on the estimated selectivity, but that would be, well, arbitrary. regards, tom lane