Re: Hypothetical indexes using BRIN broken since pg10 - Mailing list pgsql-hackers

From Tom Lane
Subject Re: Hypothetical indexes using BRIN broken since pg10
Date
Msg-id 23475.1567470067@sss.pgh.pa.us
Whole thread Raw
In response to Re: Hypothetical indexes using BRIN broken since pg10  (Julien Rouhaud <rjuju123@gmail.com>)
Responses Re: Hypothetical indexes using BRIN broken since pg10
List pgsql-hackers
Julien Rouhaud <rjuju123@gmail.com> writes:
> On Fri, Jul 26, 2019 at 1:34 PM Heikki Linnakangas <hlinnaka@iki.fi> wrote:
>> The patch assumes the default pages_per_range setting, but looking at
>> the code at https://github.com/HypoPG/hypopg, the extension actually
>> takes pages_per_range into account when it estimates the index size. I
>> guess there's no easy way to pass the pages_per_range setting down to
>> brincostestimate(). I'm not sure what we should do about that, but seems
>> that just using BRIN_DEFAULT_PAGES_PER_RANGE here is not very accurate.

> Yes, hypopg can use a custom pages_per_range as it intercepts it when
> the hypothetical index is created.  I didn't find any way to get that
> information in brincostestimate(), especially for something that could
> backpatched.  I don't like it, but I don't see how to do better than
> just using BRIN_DEFAULT_PAGES_PER_RANGE :(

I can tell you what I think ought to happen, but making it happen might
be more work than this patch should take on.

The right answer IMO is basically for the brinGetStats call to go
away from brincostestimate and instead happen during plancat.c's
building of the IndexOptInfo.  In the case of a hypothetical index,
it'd fall to the get_relation_info_hook to fill in suitable fake
data.  Sounds simple, but:

1. We really don't want even more AM-specific knowledge in plancat.c.
So I think the right way to do this would be something along the
line of adding a "void *amdata" field to IndexOptInfo, and adding
an AM callback to be called during get_relation_info that's allowed
to fill that in with some AM-specific data (which the AM's costestimate
routine would know about).  The existing btree-specific hacks in
get_relation_info should migrate into btree's version of this callback,
and IndexOptInfo.tree_height should probably go away in favor of
keeping that in btree's version of the amdata struct.

2. This approach puts a premium on the get_relation_info callback
being cheap, because there's no certainty that the data it fills
into IndexOptInfo.amdata will ever get used.  For btree, the 
_bt_getrootheight call is cheap enough to not be a problem, because
it just looks at the metapage data that btree keeps cached in the
index's relcache entry.  The same cannot be said for brinGetStats
as it stands: it goes off to read the index metapage.  There are
at least two ways to fix that:

2a. Teach brin to keep the metapage cached like btree does.
This seems like it could be a performance win across the board,
but you'd need to work out invalidation behavior, and it'd be
a bit invasive.

2b. Define IndexOptInfo.amdata as being filled lazily, that is
brincostestimate will invoke brinGetStats and fill in the data
if the pointer is currently NULL.  Then a hypothetical-index
plugin could override that by pre-filling the field with the
desired fake data.

I don't have a problem with allowing brincostestimate to fill
in defaults based on BRIN_DEFAULT_PAGES_PER_RANGE if it sees
that amdata is null and the index is hypothetical.  But there
should be a way for the get_relation_info_hook to do better.

BTW, the current patch doesn't apply according to the cfbot,
but I think it just needs a trivial rebase over 9c703c169
(ie, assume the index is already locked).  I didn't bother
to do that since what I recommend above would require a
lot more change in that area.

            regards, tom lane



pgsql-hackers by date:

Previous
From: Thomas Munro
Date:
Subject: Re: pgbench - rework variable management
Next
From: Fujii Masao
Date:
Subject: Re: pg_waldump and PREPARE