Re: [HACKERS] BRIN cost estimate - Mailing list pgsql-hackers

From David Rowley
Subject Re: [HACKERS] BRIN cost estimate
Date
Msg-id CAKJS1f9_Y9cy_n4VMmite_-2zeayyM+mdjpvBKMJ1fg0-4MMXQ@mail.gmail.com
Whole thread Raw
In response to Re: [HACKERS] BRIN cost estimate  (Alvaro Herrera <alvherre@2ndquadrant.com>)
Responses Re: [HACKERS] BRIN cost estimate  (Emre Hasegeli <emre@hasegeli.com>)
Re: [HACKERS] BRIN cost estimate  (Emre Hasegeli <emre@hasegeli.com>)
List pgsql-hackers
On 2 March 2017 at 04:40, Alvaro Herrera <alvherre@2ndquadrant.com> wrote:
> Alvaro Herrera wrote:
>
>> I have added this patch to the commitfest, which I've been intending to
>> get in for a long time.  I'll be submitting an updated patch, if needed.
>
> Here is Emre's patch rebased to current sources.

Looking over this now, and have noticed a couple of things:

1.

+ Assert(nnumbers == 1);

I think its a bad idea to Assert() this. The stat tuple can come from
a plugin which could do anything. Seems like if we need to be certain
of that then it should be an elog(ERROR), maybe mention that we
expected a 1 element array, but got <nnumbers> elements.

2.

+ Assert(varCorrelation >= 0 && varCorrelation <= 1);

same goes for that. I don't think we want to Assert() that either.
elog(ERROR) seems better, or perhaps we should fall back on the old
estimation method in this case?

The Asserted range also seems to contradict the range mentioned in
pg_statistic.h:

/** A "correlation" slot describes the correlation between the physical order* of table tuples and the ordering of data
valuesof this column, as seen* by the "<" operator identified by staop.  (As with the histogram, more* than one entry
couldtheoretically appear.) stavalues is not used and* should be NULL.  stanumbers contains a single entry, the
correlation*coefficient between the sequence of data values and the sequence of* their actual tuple positions.  The
coefficientranges from +1 to -1.*/
 
#define STATISTIC_KIND_CORRELATION 3

3.

+ numBlocks = 1 + baserel->pages / BrinGetPagesPerRange(indexRel);

should numBlocks be named numRanges? After all we call the option
"pages_per_range".

4.

+ * correlated table is copied 4 times, the correlation would be 0.25,
+ * although the index would be almost as good as the version on the

What's meant by "table is copied 4 times" ?

As best as I can work out, it means.

INSERT INTO t SELECT * FROM t;
INSERT INTO t SELECT * FROM t;

but I'm uncertain, and it's not very clear to me what it means.

5.

+ blockSelectivity = (blockProportion + 1 - *indexCorrelation) / 2;

I missed the comment that explains why we divide by two here.

6.

Might want to consider not applying this estimate when the statistics
don't contain any STATISTIC_KIND_CORRELATION array.

In this case the estimation is still applied the same way, only the
*indexCorrelation is 0.

Consider:

CREATE TABLE r2 (values INT NOT NULL);
INSERT INTO r2 VALUES(1);
ANALYZE r2;
\x on
SELECT * FROM pg_statistic WHERE starelid = (SELECT oid FROM pg_class
WHERE relname = 'r2');

Notice the lack of any stakind being set to 3.

7.

+ blockProportion = (double) BrinGetPagesPerRange(indexRel) /
+ baserel->pages;

Perhaps the blockProportion should also be clamped to the number of
pages in the relation. Even a 1 page relation is estimated to have 128
pages with the default pages_per_range, by the method used in the
patch.

blockProportion = Max(blockProportion, baserel->relpages);

during my test the selectivity was set to 64.5, then clamped to 1 by
CLAMP_PROBABILITY(). This does not seem very nice.

Good news is, I'm seeing much better plans coming out in cases when
the index is unlikely to help. So +1 for fixing this issue.

Will Emre be around to make the required changes to the patch? I see
it's been a while since it was originally posted.

-- David Rowley                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services



pgsql-hackers by date:

Previous
From: Masahiko Sawada
Date:
Subject: Re: [HACKERS] Transactions involving multiple postgres foreign servers
Next
From: Craig Ringer
Date:
Subject: Re: [HACKERS] PATCH: Batch/pipelining support for libpq