Re: Fix gin index cost estimation - Mailing list pgsql-hackers
From | Ronan Dunklau |
---|---|
Subject | Re: Fix gin index cost estimation |
Date | |
Msg-id | 4768224.31r3eYUQgx@aivenronan Whole thread Raw |
In response to | Re: Fix gin index cost estimation (Tom Lane <tgl@sss.pgh.pa.us>) |
Responses |
Re: Fix gin index cost estimation
|
List | pgsql-hackers |
Le vendredi 16 septembre 2022, 22:04:59 CEST Tom Lane a écrit : > Ronan Dunklau <ronan.dunklau@aiven.io> writes: > > The attached does that and is much simpler. I only took into account > > entryPagesFetched, not sure if we should also charge something for data pages. > > I think this is wrong, because there is already a CPU charge based on > the number of tuples visited, down at the very end of the routine: > > *indexTotalCost += (numTuples * *indexSelectivity) * (cpu_index_tuple_cost + qual_op_cost); > > It's certainly possible to argue that that's incorrectly computed, > but just adding another cost charge for the same topic can't be right. I don't think it's the same thing. The entryPagesFetched is computed independently of the selectivity and the number of tuples. As such, I think it makes sense to use it to compute the cost of descending the entry tree. As mentioned earlier, I don't really understand the formula for computing entryPagesFetched. If we were to estimate the tree height to compute the descent cost as I first proposed, I feel like we would use two different metrics for what is essentially the same cost: something proportional to the size of the entry tree. > > I do suspect that that calculation is bogus, because it looks like it's > based on the concept of "apply the quals at each index entry", which we > know is not how GIN operates. So maybe we should drop that bit in favor > of a per-page-ish cost like you have here. Not sure. In any case it > seems orthogonal to the question of startup/descent costs. Maybe we'd > better tackle just one thing at a time. Hum, good point. Maybe that should be revisited too. > > (BTW, given that that charge does exist and is not affected by > repeated-scan amortization, why do we have a problem in the first place? > Is it just too small? I guess that when we're only expecting one tuple > to be retrieved, it would only add about cpu_index_tuple_cost.) Because with a very low selectivity, we end up under-charging for the cost of walking the entry tree by a significant amount. As said above, I don't see how those two things are the same: that charge estimates the cost of applying index quals to the visited tuples, which is not the same as charging per entry page visited. > > > Is the power(0.15) used an approximation for a log ? If so why ? Also > > shouldn't we round that up ? > > No idea, but I'm pretty hesitant to just randomly fool with that equation > when (a) neither of us know where it came from and (b) exactly no evidence > has been provided that it's wrong. > > I note for instance that the existing logic switches from charging 1 page > per search to 2 pages per search at numEntryPages = 15 (1.5 ^ (1/0.15)). > Your version would switch at 2 pages, as soon as the pow() result is even > fractionally above 1.0. Maybe the existing logic is too optimistic there, > but ceil() makes it too pessimistic I think. I'd sooner tweak the power > constant than change from rint() to ceil(). You're right, I was too eager to try to raise the CPU cost proportionnally to the number of array scans (scalararrayop). I'd really like to understand where this equation comes from though... Best regards, -- Ronan Dunklau
pgsql-hackers by date: