Re: Fix gin index cost estimation - Mailing list pgsql-hackers

From Ronan Dunklau
Subject Re: Fix gin index cost estimation
Date
Msg-id 4212593.ejJDZkT8p0@aivenronan
Whole thread Raw
In response to Re: Fix gin index cost estimation  (Ronan Dunklau <ronan.dunklau@aiven.io>)
Responses Re: Fix gin index cost estimation
List pgsql-hackers
Le lundi 12 septembre 2022, 16:41:16 CEST Ronan Dunklau a écrit :
> But I realised that another approach might be better suited: since we want
to
> charge a cpu cost for every page visited, actually basing that on the
already
> estimated entryPagesFetched and dataPagesFetched would be better, instead of
> copying what is done for other indexes type and estimating the tree height.
It
> would be simpler, as we don't need to estimate the tree height anymore.
>
> I will submit a patch doing that.

The attached does that and is much simpler. I only took into account
entryPagesFetched, not sure if we should also charge something for data pages.

Instead of trying to estimate the height of the tree, we rely on the
(imperfect) estimation of the number of entry pages fetched, and charge 50
times cpu_operator_cost to that, in addition to the cpu_operator_cost charged
per entry visited.

I also adapted to take into accounts multiple scans induced by scalar array
operations.

As it is, I don't understand the following calculation:

/*
* Estimate number of entry pages read.  We need to do
* counts.searchEntries searches.  Use a power function as it should be,
 * but tuples on leaf pages usually is much greater. Here we include all
 * searches in entry tree, including search of first entry in partial
 * match algorithm
 */
entryPagesFetched += ceil(counts.searchEntries * rint(pow(numEntryPages,
0.15)));

Is the power(0.15) used an approximation for a log ? If so why ? Also
shouldn't we round that up ?
It seems to me it's unlikely to affect the total too much in normal cases
(adding at worst random_page_cost) but if we start to charge cpu operator
costs as proposed here it makes a big difference and it is probably
safer to overestimate a bit than the opposite.

With those changes, the gin cost (purely cpu-wise) stays above the btree one
as I think it should be.

--
Ronan Dunklau
Attachment

pgsql-hackers by date:

Previous
From: Simon Riggs
Date:
Subject: Re: SUBTRANS: Minimizing calls to SubTransSetParent()
Next
From: Bharath Rupireddy
Date:
Subject: Re: Assertion failure in WaitForWALToBecomeAvailable state machine