Re: nested loop semijoin estimates - Mailing list pgsql-hackers
From | Tomas Vondra |
---|---|
Subject | Re: nested loop semijoin estimates |
Date | |
Msg-id | 556A2FD2.5020302@2ndquadrant.com Whole thread Raw |
In response to | Re: nested loop semijoin estimates (Tom Lane <tgl@sss.pgh.pa.us>) |
Responses |
Re: nested loop semijoin estimates
|
List | pgsql-hackers |
Hi, On 05/30/15 23:16, Tom Lane wrote: > I wrote: >> So what this seems to mean is that for SEMI/ANTI join cases, we have to >> postpone all of the inner scan cost determination to final_cost_nestloop, >> so that we can do this differently depending on whether >> has_indexed_join_quals() is true. That's a little bit annoying because it >> will mean we take the shortcut exit less often; but since SEMI/ANTI joins >> aren't that common, it's probably not going to be a big planning time hit. > > Attached is a draft patch for that. It fixes the problem for me: > > Nested Loop Semi Join (cost=0.99..9.09 rows=1 width=74) (actual time=0.591..1.554 rows=2 loops=1) > -> Index Scan using term_facttablename_columnname_idx on term t (cost=0.55..8.57 rows=1 width=74) (actual time=0.022..0.025rows=2 loops=1) > Index Cond: (((facttablename)::text = 'facttable_stat_fta4'::text) AND ((columnname)::text = 'berechnungsart'::text)) > -> Index Only Scan using facttable_stat_fta4_berechnungsart_idx on facttable_stat_fta4 f (cost=0.43..143244.98 rows=5015134width=2) (actual time=0.759..0.759 rows=1 loops=2) > Index Cond: (berechnungsart = (t.term)::text) > Heap Fetches: 0 > Planning time: 0.545 ms > Execution time: 1.615 ms Seems to be working OK, but I still do get a Bitmap Heap Scan there (but more about that later). Do you plan to push that into 9.5, or 9.6? I assume it's a behavior change so that no back-patching, right? > >> Not sure yet about your other point about the indexscan getting >> rejected too soon. That doesn't seem to be happening for me, at >> least not in HEAD. > > I do see something of the sort if I turn off enable_indexonlyscan. > Not sure about a good fix for that aspect. It may not be terribly > critical, since AFAICS index-only scans generally ought to apply > in these cases. Hmmm, a VACUUM FREEZE fixed that for me. The reason is that right after loading the testcase, I do get this: Index Only Scan using facttable_stat_fta4_berechnungsart_idx on facttable_stat_fta4 f (cost=0.43..280220.51 rows=5000016width=2) and after VACUUM FREEZE I do get this: Index Only Scan using facttable_stat_fta4_berechnungsart_idx on facttable_stat_fta4 f (cost=0.43..142344.43 rows=5000000width=2) and the Bitmap Heap Scan case looks like this: Bitmap Heap Scan on facttable_stat_fta4 f (cost=93594.56..200342.76 rows=5000016 width=2) so it's cheaper (total cost) than the index only scan before freezing, and more expensive than index only scan after freezing. I still think this is wrong (or rather "suboptimal") - there are probably cases where even the "freezed" index only scan is more expensive than a bitmap heap scan, and in that case the it won't be used, although it'd be much faster. Another example is a query with a plain index scan, e.g. consider this slight modification of the query: SELECT facttablename, columnname, term FROM term t WHERE facttablename='facttable_stat_fta4' AND columnname='berechnungsart' AND EXISTS (SELECT 1 FROM facttable_stat_fta4 f WHERE f.berechnungsart=t.term AND einheit IS NOT NULL); This will result in bitmap index scan no matter the visibility. -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
pgsql-hackers by date: