Thread: Fix gin index cost estimation
Hello, Following the bug report at [1], I sent the attached patch to pgsql-bugs mailing list. I'm starting a thread here to add it to the next commitfest. The problem I'm trying to solve is that, contrary to btree, gist and sp-gist indexes, gin indexes do not charge any cpu-cost for descending the entry tree. This can be a problem in cases where the io cost is very low. This can happen with manual tuning of course, but more surprisingly when the the IO cost is amortized over a large number of iterations in a nested loop. In that case, we basically consider it free since everything should already be in the shared buffers. This leads to some inefficient plans, as an equivalent btree index should be picked instead. This has been discovered in PG14, as this release makes it possible to use a pg_trgm gin index with the equality operator. Before that, only the btree would have been considered and as such the discrepancy in the way we charge cpu cost didn't have noticeable effects. However, I suspect users of btree_gin could have the same kind of problems in prior versions. Best regards, [1]: https://www.postgresql.org/message-id/flat/ 2187702.iZASKD2KPV%40aivenronan#0c2498c6a85e31a589b3e9a6a3616c52 -- Ronan Dunklau
Attachment
Ronan Dunklau <ronan.dunklau@aiven.io> writes: > Following the bug report at [1], I sent the attached patch to pgsql-bugs > mailing list. I'm starting a thread here to add it to the next commitfest. That link didn't work easily for me (possibly because it got split across lines). Here's another one for anybody having similar issues: https://www.postgresql.org/message-id/flat/CABs3KGQnOkyQ42-zKQqiE7M0Ks9oWDSee%3D%2BJx3-TGq%3D68xqWYw%40mail.gmail.com > The problem I'm trying to solve is that, contrary to btree, gist and sp-gist > indexes, gin indexes do not charge any cpu-cost for descending the entry tree. As I said in the bug report thread, I think we really need to take a look at all of our index AMs not just GIN. I extended your original reproducer script to check all the AMs (attached), and suppressed memoize because it seemed to behave differently for different AMs. Here's what I see for the estimated costs of the inner indexscan, and the actual runtime, for each: btree: -> Index Only Scan using t1_btree_index on t1 (cost=0.28..0.30 rows=1 width=4) (actual time=0.001..0.001 rows=1 loops=20000) Execution Time: 19.763 ms gin (gin_trgm_ops): -> Bitmap Heap Scan on t1 (cost=0.01..0.02 rows=1 width=4) (actual time=0.003..0.003 rows=1 loops=20000) -> Bitmap Index Scan on t1_gin_index (cost=0.00..0.01 rows=1 width=0) (actual time=0.003..0.003 rows=1 loops=20000) Execution Time: 75.216 ms gist: -> Index Only Scan using t1_gist_index on t1 (cost=0.14..0.16 rows=1 width=4) (actual time=0.014..0.014 rows=1 loops=20000) Execution Time: 277.799 ms spgist: -> Index Only Scan using t1_spgist_index on t1 (cost=0.14..0.16 rows=1 width=4) (actual time=0.002..0.002 rows=1 loops=20000) Execution Time: 51.407 ms hash: -> Index Scan using t1_hash_index on t1 (cost=0.00..0.02 rows=1 width=4) (actual time=0.000..0.000 rows=1 loops=20000) Execution Time: 13.090 ms brin: -> Bitmap Heap Scan on t1 (cost=0.03..18.78 rows=1 width=4) (actual time=0.049..0.093 rows=1 loops=20000) -> Bitmap Index Scan on t1_brin_index (cost=0.00..0.03 rows=1500 width=0) (actual time=0.003..0.003 rows=70 loops=20000) Execution Time: 1890.161 ms bloom: -> Bitmap Heap Scan on t1 (cost=11.25..11.26 rows=1 width=4) (actual time=0.004..0.004 rows=1 loops=20000) -> Bitmap Index Scan on t1_bloom_index (cost=0.00..11.25 rows=1 width=0) (actual time=0.003..0.003 rows=2 loops=20000) Execution Time: 88.703 ms (These figures shouldn't be trusted too much because I did nothing to suppress noise. They seem at least somewhat reproducible, though.) So, taking btree as our reference point, gin has clearly got a problem because it's estimating less than a tenth as much cost despite actually being nearly 4X slower. gist and spgist are not as bad off, but nonetheless they claim to be cheaper than btree when they are not. The result for hash looks suspicious as well, though at least we'd make the right index choice for this particular case. brin and bloom correctly report being a lot more expensive than btree, so at least for the moment I'm not worried about them. BTW, the artificially small random_page_cost doesn't really affect this much. If I set it to a perfectly reasonable value like 1.0, gin produces a saner cost estimate but gist, spgist, and hash do not change their estimates at all. btree's estimate doesn't change either, which seems like it might be OK for index-only scans but I doubt I believe it for index scans. In any case, at least one of gin and hash is doing it wrong. In short, I think gist and spgist probably need a minor tweak to estimate more CPU cost than they do now, and hash needs a really hard look at whether it's sane at all. That's all orthogonal to the merits of your patch for gin, so I'll respond separately about that. regards, tom lane CREATE EXTENSION btree_gist; CREATE EXTENSION pg_trgm; CREATE EXTENSION bloom; drop table if exists t1,t2; CREATE TABLE t1 ( id text ); CREATE TABLE t2 ( id text primary key, t1_id text ); insert into t1 (id) select i::text FROM generate_series(1, 1500) as i; insert into t2 (id, t1_id) SELECT i::text, (i % 1500 + 1)::text FROM generate_series(1, 20000) i; ANALYZE t1, t2; SET random_page_cost = 0.00000001; -- SET random_page_cost = 1.0; SET enable_hashjoin = off; SET enable_mergejoin = off; SET enable_memoize = off; CREATE INDEX t1_btree_index ON t1 USING btree (id); explain (analyze, buffers) select * from t1 join t2 on t1.id = t2.t1_id ; DROP INDEX t1_btree_index; CREATE INDEX t1_gin_index ON t1 USING gin (id gin_trgm_ops); explain (analyze, buffers) select * from t1 join t2 on t1.id = t2.t1_id ; DROP INDEX t1_gin_index; CREATE INDEX t1_gist_index ON t1 USING gist (id); explain (analyze, buffers) select * from t1 join t2 on t1.id = t2.t1_id ; DROP INDEX t1_gist_index; CREATE INDEX t1_spgist_index ON t1 USING spgist (id); explain (analyze, buffers) select * from t1 join t2 on t1.id = t2.t1_id ; DROP INDEX t1_spgist_index; CREATE INDEX t1_hash_index ON t1 USING hash (id); explain (analyze, buffers) select * from t1 join t2 on t1.id = t2.t1_id ; DROP INDEX t1_hash_index; CREATE INDEX t1_brin_index ON t1 USING brin (id); explain (analyze, buffers) select * from t1 join t2 on t1.id = t2.t1_id ; DROP INDEX t1_brin_index; CREATE INDEX t1_bloom_index ON t1 USING bloom (id); explain (analyze, buffers) select * from t1 join t2 on t1.id = t2.t1_id ; DROP INDEX t1_bloom_index;
Ronan Dunklau <ronan.dunklau@aiven.io> writes: > The problem I'm trying to solve is that, contrary to btree, gist and sp-gist > indexes, gin indexes do not charge any cpu-cost for descending the entry tree. I looked this over briefly. I think you are correct to charge an initial-search cost per searchEntries count, but don't we also need to scale up by arrayScans, similar to the "corrections for cache effects"? + * We model index descent costs similarly to those for btree, but we also + * need an idea of the tree_height. + * We use numEntries / numEntryPages as the fanout factor. I'm not following that calculation? It seems like it'd be correct only for a tree height of 1, although maybe I'm just misunderstanding this (overly terse, perhaps) comment. + * We charge descentCost once for every entry + */ + if (numTuples > 1) + { + descentCost = ceil(log(numTuples) / log(2.0)) * cpu_operator_cost; + *indexStartupCost += descentCost * counts.searchEntries; + } I had to read this twice before absorbing the point of the numTuples test. Maybe help the reader a bit: + if (numTuples > 1) /* ensure positive log() */ Personally I'd duplicate the comments from nbtreecostestimate rather than just assuming the reader will go consult them. For that matter, why didn't you duplicate nbtree's logic for charging for SA scans? This bit seems just as relevant for GIN: * If there are ScalarArrayOpExprs, charge this once per SA scan. The * ones after the first one are not startup cost so far as the overall * plan is concerned, so add them only to "total" cost. Keep in mind also that pgindent will have its own opinions about how to format these comments, and it can out-stubborn you. Either run the comments into single paragraphs, or if you really want them to be two paras then leave an empty comment line between. Another formatting nitpick is that you seem to have added a number of unnecessary blank lines. regards, tom lane
Thank you for looking at it. > I looked this over briefly. I think you are correct to charge an > initial-search cost per searchEntries count, but don't we also need to > scale up by arrayScans, similar to the "corrections for cache effects"? > > + * We model index descent costs similarly to those for btree, but we also > + * need an idea of the tree_height. > + * We use numEntries / numEntryPages as the fanout factor. > > I'm not following that calculation? It seems like it'd be correct > only for a tree height of 1, although maybe I'm just misunderstanding > this (overly terse, perhaps) comment. I don't really understand why that would work only with a tree height of one ? Every entry page contains a certain amount of entries, and as such computing the average number of entries per page seems to be a good approximation for the fanout. But I may have misunderstood what was done in other index types. For consistency, maybe we should just use a hard coded value of 100 for the fanout factor, similarly to what we do for other index types. But I realised that another approach might be better suited: since we want to charge a cpu cost for every page visited, actually basing that on the already estimated entryPagesFetched and dataPagesFetched would be better, instead of copying what is done for other indexes type and estimating the tree height. It would be simpler, as we don't need to estimate the tree height anymore. I will submit a patch doing that. > > + * We charge descentCost once for every entry > + */ > + if (numTuples > 1) > + { > + descentCost = ceil(log(numTuples) / log(2.0)) * cpu_operator_cost; > + *indexStartupCost += descentCost * counts.searchEntries; > + } > > I had to read this twice before absorbing the point of the numTuples > test. Maybe help the reader a bit: > > + if (numTuples > 1) /* ensure positive log() */ > Ok. On second read, I think that part was actually wrong: what we care about is not the number of tuples here, but the number of entries. > Personally I'd duplicate the comments from nbtreecostestimate rather > than just assuming the reader will go consult them. For that matter, > why didn't you duplicate nbtree's logic for charging for SA scans? > This bit seems just as relevant for GIN: > > * If there are ScalarArrayOpExprs, charge this once per SA scan. The > * ones after the first one are not startup cost so far as the overall > * plan is concerned, so add them only to "total" cost. > You're right. So what we need to do here is scale up whatever we charge for the startup cost by the number of arrayscans for the total cost. > Keep in mind also that pgindent will have its own opinions about how to > format these comments, and it can out-stubborn you. Either run the > comments into single paragraphs, or if you really want them to be two > paras then leave an empty comment line between. Another formatting > nitpick is that you seem to have added a number of unnecessary blank > lines. Thanks, noted. I'll submit a new patch soon, as soon as i've resolved some of the problems I have when accounting for scalararrayops. Best regards, -- Ronan Dunklau
Le lundi 12 septembre 2022, 16:41:16 CEST Ronan Dunklau a écrit : > But I realised that another approach might be better suited: since we want to > charge a cpu cost for every page visited, actually basing that on the already > estimated entryPagesFetched and dataPagesFetched would be better, instead of > copying what is done for other indexes type and estimating the tree height. It > would be simpler, as we don't need to estimate the tree height anymore. > > I will submit a patch doing that. The attached does that and is much simpler. I only took into account entryPagesFetched, not sure if we should also charge something for data pages. Instead of trying to estimate the height of the tree, we rely on the (imperfect) estimation of the number of entry pages fetched, and charge 50 times cpu_operator_cost to that, in addition to the cpu_operator_cost charged per entry visited. I also adapted to take into accounts multiple scans induced by scalar array operations. As it is, I don't understand the following calculation: /* * Estimate number of entry pages read. We need to do * counts.searchEntries searches. Use a power function as it should be, * but tuples on leaf pages usually is much greater. Here we include all * searches in entry tree, including search of first entry in partial * match algorithm */ entryPagesFetched += ceil(counts.searchEntries * rint(pow(numEntryPages, 0.15))); Is the power(0.15) used an approximation for a log ? If so why ? Also shouldn't we round that up ? It seems to me it's unlikely to affect the total too much in normal cases (adding at worst random_page_cost) but if we start to charge cpu operator costs as proposed here it makes a big difference and it is probably safer to overestimate a bit than the opposite. With those changes, the gin cost (purely cpu-wise) stays above the btree one as I think it should be. -- Ronan Dunklau
Attachment
Ronan Dunklau <ronan.dunklau@aiven.io> writes: > The attached does that and is much simpler. I only took into account > entryPagesFetched, not sure if we should also charge something for data pages. I think this is wrong, because there is already a CPU charge based on the number of tuples visited, down at the very end of the routine: *indexTotalCost += (numTuples * *indexSelectivity) * (cpu_index_tuple_cost + qual_op_cost); It's certainly possible to argue that that's incorrectly computed, but just adding another cost charge for the same topic can't be right. I do suspect that that calculation is bogus, because it looks like it's based on the concept of "apply the quals at each index entry", which we know is not how GIN operates. So maybe we should drop that bit in favor of a per-page-ish cost like you have here. Not sure. In any case it seems orthogonal to the question of startup/descent costs. Maybe we'd better tackle just one thing at a time. (BTW, given that that charge does exist and is not affected by repeated-scan amortization, why do we have a problem in the first place? Is it just too small? I guess that when we're only expecting one tuple to be retrieved, it would only add about cpu_index_tuple_cost.) > Is the power(0.15) used an approximation for a log ? If so why ? Also > shouldn't we round that up ? No idea, but I'm pretty hesitant to just randomly fool with that equation when (a) neither of us know where it came from and (b) exactly no evidence has been provided that it's wrong. I note for instance that the existing logic switches from charging 1 page per search to 2 pages per search at numEntryPages = 15 (1.5 ^ (1/0.15)). Your version would switch at 2 pages, as soon as the pow() result is even fractionally above 1.0. Maybe the existing logic is too optimistic there, but ceil() makes it too pessimistic I think. I'd sooner tweak the power constant than change from rint() to ceil(). regards, tom lane
Le vendredi 16 septembre 2022, 22:04:59 CEST Tom Lane a écrit : > Ronan Dunklau <ronan.dunklau@aiven.io> writes: > > The attached does that and is much simpler. I only took into account > > entryPagesFetched, not sure if we should also charge something for data pages. > > I think this is wrong, because there is already a CPU charge based on > the number of tuples visited, down at the very end of the routine: > > *indexTotalCost += (numTuples * *indexSelectivity) * (cpu_index_tuple_cost + qual_op_cost); > > It's certainly possible to argue that that's incorrectly computed, > but just adding another cost charge for the same topic can't be right. I don't think it's the same thing. The entryPagesFetched is computed independently of the selectivity and the number of tuples. As such, I think it makes sense to use it to compute the cost of descending the entry tree. As mentioned earlier, I don't really understand the formula for computing entryPagesFetched. If we were to estimate the tree height to compute the descent cost as I first proposed, I feel like we would use two different metrics for what is essentially the same cost: something proportional to the size of the entry tree. > > I do suspect that that calculation is bogus, because it looks like it's > based on the concept of "apply the quals at each index entry", which we > know is not how GIN operates. So maybe we should drop that bit in favor > of a per-page-ish cost like you have here. Not sure. In any case it > seems orthogonal to the question of startup/descent costs. Maybe we'd > better tackle just one thing at a time. Hum, good point. Maybe that should be revisited too. > > (BTW, given that that charge does exist and is not affected by > repeated-scan amortization, why do we have a problem in the first place? > Is it just too small? I guess that when we're only expecting one tuple > to be retrieved, it would only add about cpu_index_tuple_cost.) Because with a very low selectivity, we end up under-charging for the cost of walking the entry tree by a significant amount. As said above, I don't see how those two things are the same: that charge estimates the cost of applying index quals to the visited tuples, which is not the same as charging per entry page visited. > > > Is the power(0.15) used an approximation for a log ? If so why ? Also > > shouldn't we round that up ? > > No idea, but I'm pretty hesitant to just randomly fool with that equation > when (a) neither of us know where it came from and (b) exactly no evidence > has been provided that it's wrong. > > I note for instance that the existing logic switches from charging 1 page > per search to 2 pages per search at numEntryPages = 15 (1.5 ^ (1/0.15)). > Your version would switch at 2 pages, as soon as the pow() result is even > fractionally above 1.0. Maybe the existing logic is too optimistic there, > but ceil() makes it too pessimistic I think. I'd sooner tweak the power > constant than change from rint() to ceil(). You're right, I was too eager to try to raise the CPU cost proportionnally to the number of array scans (scalararrayop). I'd really like to understand where this equation comes from though... Best regards, -- Ronan Dunklau
On Mon, Sep 19, 2022 at 09:15:25AM +0200, Ronan Dunklau wrote: > You're right, I was too eager to try to raise the CPU cost proportionnally to > the number of array scans (scalararrayop). I'd really like to understand where > this equation comes from though... So, what's the latest update here? -- Michael
Attachment
> > You're right, I was too eager to try to raise the CPU cost proportionnally to > > the number of array scans (scalararrayop). I'd really like to understand where > > this equation comes from though... > > So, what's the latest update here? Thanks Michael for reviving this thread. Before proceeding any further with this, I'd like to understand where we stand. Tom argued my way of charging cost per entry pages visited boils down to charging per tuple, which I expressed disagreement with. If we can come to a consensus whether that's a bogus way of thinking about it I'll proceed with what we agree on. -- Ronan Dunklau
Hi, Ronan!
On Wed, Oct 12, 2022 at 10:15 AM Ronan Dunklau <ronan.dunklau@aiven.io> wrote:
> > > You're right, I was too eager to try to raise the CPU cost proportionnally
> to
> > > the number of array scans (scalararrayop). I'd really like to understand
> where
> > > this equation comes from though...
> >
> > So, what's the latest update here?
>
> Thanks Michael for reviving this thread.
>
> Before proceeding any further with this, I'd like to understand where we
> stand. Tom argued my way of charging cost per entry pages visited boils down
> to charging per tuple, which I expressed disagreement with.
>
> If we can come to a consensus whether that's a bogus way of thinking about it
> I'll proceed with what we agree on.
I briefly read the thread. I think this line is copy-paste from other index access methods and trying to estimate the whole index scan CPU cost by bypassing all the details.
*indexTotalCost += (numTuples * *indexSelectivity) * (cpu_index_tuple_cost + qual_op_cost);
I think Tom's point was that it's wrong to add a separate entry-tree CPU cost estimation to another estimation, which tries (very inadequately) to estimate the whole scan cost. Instead, I propose writing better estimations for both entry-tree CPU cost and data-trees CPU cost and replacing existing CPU estimation altogether.
------
Regards,
Alexander Korotkov
On Wed, Oct 12, 2022 at 10:15 AM Ronan Dunklau <ronan.dunklau@aiven.io> wrote:
> > > You're right, I was too eager to try to raise the CPU cost proportionnally
> to
> > > the number of array scans (scalararrayop). I'd really like to understand
> where
> > > this equation comes from though...
> >
> > So, what's the latest update here?
>
> Thanks Michael for reviving this thread.
>
> Before proceeding any further with this, I'd like to understand where we
> stand. Tom argued my way of charging cost per entry pages visited boils down
> to charging per tuple, which I expressed disagreement with.
>
> If we can come to a consensus whether that's a bogus way of thinking about it
> I'll proceed with what we agree on.
I briefly read the thread. I think this line is copy-paste from other index access methods and trying to estimate the whole index scan CPU cost by bypassing all the details.
*indexTotalCost += (numTuples * *indexSelectivity) * (cpu_index_tuple_cost + qual_op_cost);
I think Tom's point was that it's wrong to add a separate entry-tree CPU cost estimation to another estimation, which tries (very inadequately) to estimate the whole scan cost. Instead, I propose writing better estimations for both entry-tree CPU cost and data-trees CPU cost and replacing existing CPU estimation altogether.
------
Regards,
Alexander Korotkov
Alexander Korotkov <aekorotkov@gmail.com> writes: > I think Tom's point was that it's wrong to add a separate entry-tree CPU > cost estimation to another estimation, which tries (very inadequately) to > estimate the whole scan cost. Instead, I propose writing better estimations > for both entry-tree CPU cost and data-trees CPU cost and replacing existing > CPU estimation altogether. Great idea, if someone is willing to do it ... regards, tom lane
Le mardi 25 octobre 2022, 16:18:57 CET Tom Lane a écrit : > Alexander Korotkov <aekorotkov@gmail.com> writes: > > I think Tom's point was that it's wrong to add a separate entry-tree CPU > > cost estimation to another estimation, which tries (very inadequately) to > > estimate the whole scan cost. Instead, I propose writing better > > estimations > > for both entry-tree CPU cost and data-trees CPU cost and replacing > > existing > > CPU estimation altogether. > > Great idea, if someone is willing to do it ... > > regards, tom lane Hello, Sorry for the delay, but here is an updated patch which changes the costing in the following way: - add a descent cost similar to the btree one is charged for the initial entry-tree - additionally, a charge is applied per page in both the entry tree and posting trees / lists - instead of charging the quals to each tuple, charge them per entry only. We still charge cpu_index_tuple_cost per tuple though. With those changes, no need to tweak the magic number formula estimating the number of pages. Maybe we can come up with something better for estimating those later on ? Best regards, -- Ronan Dunklau
Attachment
Hi, Ronan! On Fri, Dec 2, 2022 at 1:19 PM Ronan Dunklau <ronan.dunklau@aiven.io> wrote: > Sorry for the delay, but here is an updated patch which changes the costing in > the following way: > > - add a descent cost similar to the btree one is charged for the initial > entry-tree > - additionally, a charge is applied per page in both the entry tree and > posting trees / lists > - instead of charging the quals to each tuple, charge them per entry only. We > still charge cpu_index_tuple_cost per tuple though. > > With those changes, no need to tweak the magic number formula estimating the > number of pages. Maybe we can come up with something better for estimating > those later on ? Thank you for your patch. Couple of quick questions. 1) What magic number 50.0 stands for? I think we at least should make it a macro. 2) "We only charge one data page for the startup cost" – should this be dependent on number of search entries? ------ Regards, Alexander Korotkov
Le vendredi 2 décembre 2022, 12:33:33 CET Alexander Korotkov a écrit : > Hi, Ronan! > Thank you for your patch. Couple of quick questions. > 1) What magic number 50.0 stands for? I think we at least should make > it a macro. This is what is used in other tree-descending estimation functions, so I used that too. Maybe a DEFAULT_PAGE_CPU_COST macro would work for both ? If so I'll separate this into two patches, one introducing the macro for the other estimation functions, and this patch for gin. > 2) "We only charge one data page for the startup cost" – should this > be dependent on number of search entries? Good point, multiplying it by the number of search entries would do the trick. Thank you for looking at this ! Regards, -- Ronan Dunklau
Le vendredi 2 décembre 2022, 13:58:27 CET Ronan Dunklau a écrit : > Le vendredi 2 décembre 2022, 12:33:33 CET Alexander Korotkov a écrit : > > Hi, Ronan! > > Thank you for your patch. Couple of quick questions. > > 1) What magic number 50.0 stands for? I think we at least should make > > it a macro. > > This is what is used in other tree-descending estimation functions, so I > used that too. Maybe a DEFAULT_PAGE_CPU_COST macro would work for both ? If > so I'll separate this into two patches, one introducing the macro for the > other estimation functions, and this patch for gin. The 0001 patch does this. > > > 2) "We only charge one data page for the startup cost" – should this > > be dependent on number of search entries? In fact there was another problem. The current code estimate two different pathes for fetching data pages: in the case of a partial match, it takes into account that all the data pages will have to be fetched. So this is is now taken into account for the CPU cost as well. For the regular search, we scale the number of data pages by the number of search entries. Best regards, -- Ronan Dunklau
Attachment
On Tue, Dec 6, 2022 at 1:22 PM Ronan Dunklau <ronan.dunklau@aiven.io> wrote: > Le vendredi 2 décembre 2022, 13:58:27 CET Ronan Dunklau a écrit : > > Le vendredi 2 décembre 2022, 12:33:33 CET Alexander Korotkov a écrit : > > > Hi, Ronan! > > > Thank you for your patch. Couple of quick questions. > > > 1) What magic number 50.0 stands for? I think we at least should make > > > it a macro. > > > > This is what is used in other tree-descending estimation functions, so I > > used that too. Maybe a DEFAULT_PAGE_CPU_COST macro would work for both ? If > > so I'll separate this into two patches, one introducing the macro for the > > other estimation functions, and this patch for gin. > > The 0001 patch does this. > > > > > > 2) "We only charge one data page for the startup cost" – should this > > > be dependent on number of search entries? > > In fact there was another problem. The current code estimate two different > pathes for fetching data pages: in the case of a partial match, it takes into > account that all the data pages will have to be fetched. So this is is now > taken into account for the CPU cost as well. > > For the regular search, we scale the number of data pages by the number of > search entries. Now the patch looks good for me. I made some tests. # create extension pg_trgm; # create extension btree_gin; # create table test1 as (select random() as val from generate_series(1,1000000) i); # create index test1_gin_idx on test1 using gin (val); # explain select * from test1 where val between 0.1 and 0.2; QUERY PLAN --------------------------------------------------------------------------------------------- Bitmap Heap Scan on test1 (cost=1186.21..7089.57 rows=98557 width=8) Recheck Cond: ((val >= '0.1'::double precision) AND (val <= '0.2'::double precision)) -> Bitmap Index Scan on test1_gin_idx (cost=0.00..1161.57 rows=98557 width=0) Index Cond: ((val >= '0.1'::double precision) AND (val <= '0.2'::double precision)) (4 rows) # create index test1_btree_idx on test1 using btree (val); # explain select * from test1 where val between 0.1 and 0.2; QUERY PLAN ----------------------------------------------------------------------------------------- Index Only Scan using test1_btree_idx on test1 (cost=0.42..3055.57 rows=98557 width=8) Index Cond: ((val >= '0.1'::double precision) AND (val <= '0.2'::double precision)) (2 rows) Looks reasonable. In this case GIN is much more expensive, because it can't handle range query properly and overlaps two partial matches. # create table test2 as (select 'samplestring' || i as val from generate_series(1,1000000) i); # create index test2_gin_idx on test2 using gin (val); # explain select * from test2 where val = 'samplestring500000'; QUERY PLAN ----------------------------------------------------------------------------- Bitmap Heap Scan on test2 (cost=20.01..24.02 rows=1 width=18) Recheck Cond: (val = 'samplestring500000'::text) -> Bitmap Index Scan on test2_gin_idx (cost=0.00..20.01 rows=1 width=0) Index Cond: (val = 'samplestring500000'::text) (4 rows) # create index test2_btree_idx on test2 using btree (val); # explain select * from test2 where val = 'samplestring500000'; QUERY PLAN ----------------------------------------------------------------------------------- Index Only Scan using test2_btree_idx on test2 (cost=0.42..4.44 rows=1 width=18) Index Cond: (val = 'samplestring500000'::text) (2 rows) This also looks reasonable. GIN is not terribly bad for this case, but B-tree is much cheaper. I'm going to push this and backpatch to all supported versions if no objections. ------ Regards, Alexander Korotkov
Alexander Korotkov <aekorotkov@gmail.com> writes: > I'm going to push this and backpatch to all supported versions if no objections. Push yes, but I'd counsel against back-patching. People don't generally like unexpected plan changes in stable versions, and that's what a costing change could produce. There's no argument that we are fixing a failure or wrong answer here, so it doesn't seem like back-patch material. regards, tom lane
On Sun, Jan 8, 2023 at 7:08 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: > Alexander Korotkov <aekorotkov@gmail.com> writes: > > I'm going to push this and backpatch to all supported versions if no objections. > > Push yes, but I'd counsel against back-patching. People don't > generally like unexpected plan changes in stable versions, and > that's what a costing change could produce. There's no argument > that we are fixing a failure or wrong answer here, so it doesn't > seem like back-patch material. Pushed to master, thank you. I've noticed the reason for non-backpatching in the commit message. ------ Regards, Alexander Korotkov