Re: [HACKERS] BRIN cost estimate - Mailing list pgsql-hackers

From David Rowley
Subject Re: [HACKERS] BRIN cost estimate
Date
Msg-id CAKJS1f9=-dv8wGHZ8KU1KnV1K0vFuy8JOcH_s774W61TJvbOfA@mail.gmail.com
Whole thread Raw
In response to Re: [HACKERS] BRIN cost estimate  (Emre Hasegeli <emre@hasegeli.com>)
Responses Re: [HACKERS] BRIN cost estimate  (Emre Hasegeli <emre@hasegeli.com>)
Re: BRIN cost estimate  (Emre Hasegeli <emre@hasegeli.com>)
List pgsql-hackers
On 17 March 2017 at 23:50, Emre Hasegeli <emre@hasegeli.com> wrote:
>> 1.
>>
>> + Assert(nnumbers == 1);
>>
>> I think its a bad idea to Assert() this. The stat tuple can come from
>> a plugin which could do anything. Seems like if we need to be certain
>> of that then it should be an elog(ERROR), maybe mention that we
>> expected a 1 element array, but got <nnumbers> elements.
>
> But it is not coming from a plugin which can do anything.  It is the
> plugin we know.  Assert() is exact same coding with btcostestimate().

Not sure what you mean here. I'm not speaking of the brin index am, I
mean the get_index_stats_hook call which you've added. An extension
can be loaded which sets this hook and provides its own tuple, which
may cause the Assert to fail. Asserts are normally used to verify in
assert enabled builds that would cause some following code to crash or
not do what it's meant to. Normally they're used to help track down
bugs in the software, they're not meant to track bugs in some software
we have no control over.

>> 2.
>>
>> + Assert(varCorrelation >= 0 && varCorrelation <= 1);
>>
>> same goes for that. I don't think we want to Assert() that either.
>> elog(ERROR) seems better, or perhaps we should fall back on the old
>> estimation method in this case?
>
> Again the correlation value is expected to be in this range.  I don't
> think we even need to bother with Assert().  Garbage in garbage out.

That's fine. Let's just not garbage crash.

>> The Asserted range also seems to contradict the range mentioned in
>> pg_statistic.h:
>
> We are using Abs() of the value.

I missed that.

>> 3.
>>
>> + numBlocks = 1 + baserel->pages / BrinGetPagesPerRange(indexRel);
>>
>> should numBlocks be named numRanges? After all we call the option
>> "pages_per_range".
>
> It was named this way before.

hmm, before what exactly? before your patch it didn't exist. You
introduced it into brincostestimate().

Here's the line of code:

+ double numBlocks;

If we want to have a variable which stores the number of ranges, then
I think numRanges is better than numBlocks. I can't imagine many
people would disagree there.

>> 6.
>>
>> Might want to consider not applying this estimate when the statistics
>> don't contain any STATISTIC_KIND_CORRELATION array.
>
> I am not sure what would be better than using the value as 0 in this case.

At the very least please write a comment to explain this in the code.
Right now it looks broken. If I noticed this then one day in the
future someone else will. If you write a comment then person of the
future will likely read it, and then not raise any questions about the
otherwise questionable code.

> I can look into supporting expression indexes, if you think something
> very much incomplete like this has a chance to get committed.

I do strongly agree that the estimates need improved here. I've
personally had issues with bad brin estimates before, and I'd like to
see it improved. I think the patch is not quite complete without it
also considering stats on expression indexes. If you have time to go
do that I'd suggest you go ahead with that.

I've not looked at the new patch yet, but thanks for making some of
those changes.

-- David Rowley                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services



pgsql-hackers by date:

Previous
From: Michael Banck
Date:
Subject: Re: [HACKERS] Create replication slot in pg_basebackup if requestedand not yet present
Next
From: Dilip Kumar
Date:
Subject: Re: [HACKERS] Problem in Parallel Bitmap Heap Scan?