Re: tableam: abstracting relation sizing code - Mailing list pgsql-hackers

From Robert Haas
Subject Re: tableam: abstracting relation sizing code
Date
Msg-id CA+TgmoZKe713aWGiKkx96LFyQQ1qSb4LMXAyREN34ZNwK68rEA@mail.gmail.com
Whole thread Raw
In response to Re: tableam: abstracting relation sizing code  (Daniel Gustafsson <daniel@yesql.se>)
Responses Re: tableam: abstracting relation sizing code
Re: tableam: abstracting relation sizing code
List pgsql-hackers
On Fri, Jun 7, 2019 at 4:11 AM Daniel Gustafsson <daniel@yesql.se> wrote:
> Makes sense. Regarding one of the hacks:
>
> +        * HACK: if the relation has never yet been vacuumed, use a minimum size
> +        * estimate of 10 pages.  The idea here is to avoid assuming a
> +        * newly-created table is really small, even if it currently is, because
> +        * that may not be true once some data gets loaded into it.
>
> When this is a generic function for AM’s, would it make sense to make the
> minimum estimate a passed in value rather than hardcoded at 10?  I don’t have a
> case in mind, but I also don’t think that assumptions made for heap necessarily
> makes sense for all AM’s. Just thinking out loud.

I think that's probably going in the wrong direction.  It's arguable,
of course.  However, it seems to me that there's nothing heap-specific
about the number 10.  It's not computed based on the format of a heap
page or a heap tuple.  It's just somebody's guess (likely Tom's) about
how to plan with empty relations.  If somebody finds that another
number works better for their AM, it's probably also better for heap,
at least on that person's workload.  It seems more likely to me that
this needs to be a GUC or reloption than that it needs to be an
AM-specific property.  It also seems to me that if the parameters of
the hack randomly vary from one AM to another, it's likely to create
confusing plan differences that have nothing to do with the actual
merits of what the AMs are doing.  But all that being said, I'm not
blocking anybody from fooling around with this; nobody's obliged to
use the helper function.  It's just there for people who want the same
AM-independent logic that the heap uses.

> > Here is a patch that tries to improve the situation.  I don't know
> > whether there is some better approach; this seemed like the obvious
> > thing to do.
>
> A small nitpick on the patch:
>
> + * estimate_rel_size callback, because it has a few additional paramters.
>
> s/paramters/parameters/

Good catch, and now I notice that the callback is not called
estimate_rel_size but relation_estimate_size.  Updated patch attached;
thanks for the review.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Attachment

pgsql-hackers by date:

Previous
From: Robert Haas
Date:
Subject: Re: tableam: abstracting relation sizing code
Next
From: Robert Haas
Date:
Subject: Re: hyrax vs. RelationBuildPartitionDesc