Re: Hypothetical indexes using BRIN broken since pg10 - Mailing list pgsql-hackers

From Heikki Linnakangas
Subject Re: Hypothetical indexes using BRIN broken since pg10
Date
Msg-id b847493e-d263-3f2e-1802-689e778c9a58@iki.fi
Whole thread Raw
In response to Re: Hypothetical indexes using BRIN broken since pg10  (Alvaro Herrera <alvherre@2ndquadrant.com>)
Responses Re: Hypothetical indexes using BRIN broken since pg10  (Julien Rouhaud <rjuju123@gmail.com>)
List pgsql-hackers
On 27/06/2019 23:09, Alvaro Herrera wrote:
> On 2019-Jun-27, Tom Lane wrote:
> 
>> Dunno, I just can't get excited about exposing REVMAP_PAGE_MAXITEMS.
>> Especially not since we seem to agree on the long-term solution here,
>> and what you're suggesting to Julien doesn't particularly fit into
>> that long-term solution.
> 
> Well, it was brin_page.h, which is supposed to be internal to BRIN
> itself.  But since we admit that in its current state selfuncs.c is not
> salvageable as a module and we'll redo the whole thing in the short
> term, I withdraw my comment.

There seems to be consensus on the going with the approach from the 
original patch, so I had a closer look at it.

The patch first does this:

> 
>     /*
>      * Obtain some data from the index itself, if possible.  Otherwise invent
>      * some plausible internal statistics based on the relation page count.
>      */
>     if (!index->hypothetical)
>     {
>         indexRel = index_open(index->indexoid, AccessShareLock);
>         brinGetStats(indexRel, &statsData);
>         index_close(indexRel, AccessShareLock);
>     }
>     else
>     {
>         /*
>          * Assume default number of pages per range, and estimate the number
>          * of ranges based on that.
>          */
>         indexRanges = Max(ceil((double) baserel->pages /
>                                BRIN_DEFAULT_PAGES_PER_RANGE), 1.0);
> 
>         statsData.pagesPerRange = BRIN_DEFAULT_PAGES_PER_RANGE;
>         statsData.revmapNumPages = (indexRanges / REVMAP_PAGE_MAXITEMS) + 1;
>     }
>    ...

And later in the function, there's this:

>    /* work out the actual number of ranges in the index */
>    indexRanges = Max(ceil((double) baserel->pages / statsData.pagesPerRange),
>                      1.0);

It seems a bit error-prone that essentially the same formula is used 
twice in the function, to compute 'indexRanges', with some distance 
between them. Perhaps some refactoring would help with, although I'm not 
sure what exactly would be better. Maybe move the second computation 
earlier in the function, like in the attached patch?

The patch assumes the default pages_per_range setting, but looking at 
the code at https://github.com/HypoPG/hypopg, the extension actually 
takes pages_per_range into account when it estimates the index size. I 
guess there's no easy way to pass the pages_per_range setting down to 
brincostestimate(). I'm not sure what we should do about that, but seems 
that just using BRIN_DEFAULT_PAGES_PER_RANGE here is not very accurate.

The attached patch is based on PG v11, because I tested this with 
https://github.com/HypoPG/hypopg, and it didn't compile with later 
versions. There's a small difference in the locking level used between 
v11 and 12, which makes the patch not apply across versions, but that's 
easy to fix by hand.

- Heikki

Attachment

pgsql-hackers by date:

Previous
From: Fabien COELHO
Date:
Subject: Re: make libpq documentation navigable between functions
Next
From: Julien Rouhaud
Date:
Subject: Re: Hypothetical indexes using BRIN broken since pg10