Re: Fix gin index cost estimation - Mailing list pgsql-hackers

From Ronan Dunklau
Subject Re: Fix gin index cost estimation
Date
Msg-id 1924159.usQuhbGJ8B@aivenronan
Whole thread Raw
In response to Re: Fix gin index cost estimation  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: Fix gin index cost estimation
List pgsql-hackers
Thank you for looking at it.

> I looked this over briefly.  I think you are correct to charge an
> initial-search cost per searchEntries count, but don't we also need to
> scale up by arrayScans, similar to the "corrections for cache effects"?
> 
> +     * We model index descent costs similarly to those for btree, but 
we also
> +     * need an idea of the tree_height.
> +     * We use numEntries / numEntryPages as the fanout factor.
> 
> I'm not following that calculation?  It seems like it'd be correct
> only for a tree height of 1, although maybe I'm just misunderstanding
> this (overly terse, perhaps) comment.

I don't really understand why that would work only with a tree height of one ? 
Every entry page contains a certain amount of entries, and as such computing 
the average number of entries per page seems to be a good approximation for 
the fanout. But I may have misunderstood what was done in other index types.

For consistency, maybe we should just use a hard coded value of 100 for the 
fanout factor, similarly to what we do for other index types.

But I realised that another approach might be better suited: since we want to 
charge a cpu cost for every page visited, actually basing that on the already 
estimated entryPagesFetched and dataPagesFetched would be better, instead of 
copying what is done for other indexes type and estimating the tree height. It 
would be simpler, as we don't need to estimate the tree height anymore.

I will submit a patch doing that.

> 
> +     * We charge descentCost once for every entry
> +     */
> +    if (numTuples > 1)
> +    {
> +        descentCost = ceil(log(numTuples) / log(2.0)) * 
cpu_operator_cost;
> +        *indexStartupCost += descentCost * 
counts.searchEntries;
> +    }
> 
> I had to read this twice before absorbing the point of the numTuples
> test.  Maybe help the reader a bit:
> 
> +    if (numTuples > 1)                /* ensure positive log() */
> 

Ok. On second read, I think that part was actually wrong: what we care about 
is not the number of tuples here, but the number of entries. 

> Personally I'd duplicate the comments from nbtreecostestimate rather
> than just assuming the reader will go consult them.  For that matter,
> why didn't you duplicate nbtree's logic for charging for SA scans?
> This bit seems just as relevant for GIN:
> 
>      * If there are ScalarArrayOpExprs, charge this once per SA scan.  
The
>      * ones after the first one are not startup cost so far as the 
overall
>      * plan is concerned, so add them only to "total" cost.
> 

You're right. So what we need to do here is scale up whatever we charge for 
the startup cost by the number of arrayscans for the total cost.

> Keep in mind also that pgindent will have its own opinions about how to
> format these comments, and it can out-stubborn you.  Either run the
> comments into single paragraphs, or if you really want them to be two
> paras then leave an empty comment line between.  Another formatting
> nitpick is that you seem to have added a number of unnecessary blank
> lines.

Thanks, noted.

I'll submit a new patch soon, as soon as i've resolved some of the problems I 
have when accounting for scalararrayops.

Best regards,

-- 
Ronan Dunklau





pgsql-hackers by date:

Previous
From: Erik Rijkers
Date:
Subject: Re: proposal: possibility to read dumped table's name from file
Next
From: Peter Eisentraut
Date:
Subject: Re: list of acknowledgments for PG15