Home > mailing lists

Re: Fix gin index cost estimation - Mailing list pgsql-hackers

From	Ronan Dunklau
Subject	Re: Fix gin index cost estimation
Date	September 15, 2022 10:25:06
Msg-id	4212593.ejJDZkT8p0@aivenronan Whole thread Raw
In response to	Re: Fix gin index cost estimation (Ronan Dunklau <ronan.dunklau@aiven.io>)
Responses	Re: Fix gin index cost estimation
List	pgsql-hackers

Tree view

Le lundi 12 septembre 2022, 16:41:16 CEST Ronan Dunklau a écrit :
> But I realised that another approach might be better suited: since we want
to
> charge a cpu cost for every page visited, actually basing that on the
already
> estimated entryPagesFetched and dataPagesFetched would be better, instead of
> copying what is done for other indexes type and estimating the tree height.
It
> would be simpler, as we don't need to estimate the tree height anymore.
>
> I will submit a patch doing that.

The attached does that and is much simpler. I only took into account
entryPagesFetched, not sure if we should also charge something for data pages.

Instead of trying to estimate the height of the tree, we rely on the
(imperfect) estimation of the number of entry pages fetched, and charge 50
times cpu_operator_cost to that, in addition to the cpu_operator_cost charged
per entry visited.

I also adapted to take into accounts multiple scans induced by scalar array
operations.

As it is, I don't understand the following calculation:

/*
* Estimate number of entry pages read.  We need to do
* counts.searchEntries searches.  Use a power function as it should be,
 * but tuples on leaf pages usually is much greater. Here we include all
 * searches in entry tree, including search of first entry in partial
 * match algorithm
 */
entryPagesFetched += ceil(counts.searchEntries * rint(pow(numEntryPages,
0.15)));

Is the power(0.15) used an approximation for a log ? If so why ? Also
shouldn't we round that up ?
It seems to me it's unlikely to affect the total too much in normal cases
(adding at worst random_page_cost) but if we start to charge cpu operator
costs as proposed here it makes a big difference and it is probably
safer to overestimate a bit than the opposite.

With those changes, the gin cost (purely cpu-wise) stays above the btree one
as I think it should be.

--
Ronan Dunklau

Attachment

v2-0001-Fix-gin-costing.patch

pgsql-hackers by date:

From: Simon Riggs
Date: 15 September 2022, 10:04:14
Subject: Re: SUBTRANS: Minimizing calls to SubTransSetParent()

From: Bharath Rupireddy
Date: 15 September 2022, 10:25:54
Subject: Re: Assertion failure in WaitForWALToBecomeAvailable state machine

Re: Fix gin index cost estimation - Mailing list pgsql-hackers

Attachment

Previous

Next