Thread: [sqlsmith] Crash in mcv_get_match_bitmap

[sqlsmith] Crash in mcv_get_match_bitmap

From

Andreas Seltenreich

Date:

10 July 2019, 20:37:51

Hi,

running sqlsmith on the regression database of REL_12_STABLE at
ff597b656f yielded a crash in mcv_get_match_bitmap.  I can reproduce it
with the following query on the regression database:

    select filler1 from mcv_lists where a is not null and (select 42) <= c;

Backtrace below.

regards,
Andreas

Program received signal SIGSEGV, Segmentation fault.
pg_detoast_datum (datum=0x0) at fmgr.c:1741
(gdb) bt
#0  pg_detoast_datum (datum=0x0) at fmgr.c:1741
#1  0x000055b2bbeb2656 in numeric_le (fcinfo=0x7ffceeb2cb90) at numeric.c:2139
#2  0x000055b2bbf3cdca in FunctionCall2Coll (flinfo=flinfo@entry=0x7ffceeb2cc30, collation=collation@entry=100,
    arg1=<optimized out>, arg2=<optimized out>) at fmgr.c:1162
#3  0x000055b2bbdd7aec in mcv_get_match_bitmap (root=0x55b2bd2acff0, clauses=<optimized out>, keys=0x55b2bd2c4e38,
    mcvlist=0x55b2bd2c44e0, is_or=false) at mcv.c:1638
#4  0x000055b2bbdda581 in mcv_clauselist_selectivity (root=root@entry=0x55b2bd2acff0, stat=stat@entry=0x55b2bd2c4e00,
    clauses=clauses@entry=0x55b2bd2c5298, varRelid=varRelid@entry=0, jointype=jointype@entry=JOIN_INNER,
sjinfo=sjinfo@entry=0x0,
    rel=0x55b2bd2c4158, basesel=0x7ffceeb2cd70, totalsel=0x7ffceeb2cd78) at mcv.c:1876
#5  0x000055b2bbdd6064 in statext_mcv_clauselist_selectivity (estimatedclauses=0x7ffceeb2cde8, rel=0x55b2bd2c4158,
    sjinfo=<optimized out>, jointype=<optimized out>, varRelid=<optimized out>, clauses=0x55b2bd2c4e00, root=<optimized
out>)
    at extended_stats.c:1146
#6  statext_clauselist_selectivity (root=root@entry=0x55b2bd2acff0, clauses=clauses@entry=0x55b2bd2c5010,
    varRelid=varRelid@entry=0, jointype=jointype@entry=JOIN_INNER, sjinfo=sjinfo@entry=0x0, rel=0x55b2bd2c4158,
    estimatedclauses=0x7ffceeb2cde8) at extended_stats.c:1177
#7  0x000055b2bbd27372 in clauselist_selectivity (root=root@entry=0x55b2bd2acff0, clauses=0x55b2bd2c5010,
    varRelid=varRelid@entry=0, jointype=jointype@entry=JOIN_INNER, sjinfo=sjinfo@entry=0x0) at clausesel.c:94
#8  0x000055b2bbd2d788 in set_baserel_size_estimates (root=root@entry=0x55b2bd2acff0, rel=rel@entry=0x55b2bd2c4158)
    at costsize.c:4411
#9  0x000055b2bbd24658 in set_plain_rel_size (rte=0x55b2bd20cf00, rel=0x55b2bd2c4158, root=0x55b2bd2acff0) at
allpaths.c:583
#10 set_rel_size (root=root@entry=0x55b2bd2acff0, rel=rel@entry=0x55b2bd2c4158, rti=rti@entry=1,
rte=rte@entry=0x55b2bd20cf00)
    at allpaths.c:412
#11 0x000055b2bbd264a0 in set_base_rel_sizes (root=<optimized out>) at allpaths.c:323
#12 make_one_rel (root=root@entry=0x55b2bd2acff0, joinlist=joinlist@entry=0x55b2bd2c49c0) at allpaths.c:185
#13 0x000055b2bbd482f8 in query_planner (root=root@entry=0x55b2bd2acff0,
    qp_callback=qp_callback@entry=0x55b2bbd48ed0 <standard_qp_callback>, qp_extra=qp_extra@entry=0x7ffceeb2d070) at
planmain.c:271
#14 0x000055b2bbd4cb32 in grouping_planner (root=<optimized out>, inheritance_update=false, tuple_fraction=<optimized
out>)
    at planner.c:2048
#15 0x000055b2bbd4f900 in subquery_planner (glob=glob@entry=0x55b2bd2b1c88, parse=parse@entry=0x55b2bd20cd88,
    parent_root=parent_root@entry=0x0, hasRecursion=hasRecursion@entry=false, tuple_fraction=tuple_fraction@entry=0)
    at planner.c:1012
#16 0x000055b2bbd509c6 in standard_planner (parse=0x55b2bd20cd88, cursorOptions=256, boundParams=<optimized out>) at
planner.c:406
#17 0x000055b2bbe13b89 in pg_plan_query (querytree=querytree@entry=0x55b2bd20cd88,
cursorOptions=cursorOptions@entry=256,
    boundParams=boundParams@entry=0x0) at postgres.c:878
[...]

Re: [sqlsmith] Crash in mcv_get_match_bitmap

From

Tom Lane

Date:

10 July 2019, 20:57:54

Andreas Seltenreich <seltenreich@gmx.de> writes:
> running sqlsmith on the regression database of REL_12_STABLE at
> ff597b656f yielded a crash in mcv_get_match_bitmap.  I can reproduce it
> with the following query on the regression database:
>     select filler1 from mcv_lists where a is not null and (select 42) <= c;

Seems to be the same problem I just complained of in the other
thread: mcv_get_match_bitmap has an untenable assumption that
"is a pseudoconstant" means "is a Const".

I notice it's also assuming that the Const must be non-null.
It's not really easy to crash it that way, because if you just
write "null <= c" that'll get reduced to constant-NULL earlier.
But I bet there's a way.

            regards, tom lane

Re: [sqlsmith] Crash in mcv_get_match_bitmap

From

Tomas Vondra

Date:

10 July 2019, 21:26:09

On Wed, Jul 10, 2019 at 04:57:54PM -0400, Tom Lane wrote:
>Andreas Seltenreich <seltenreich@gmx.de> writes:
>> running sqlsmith on the regression database of REL_12_STABLE at
>> ff597b656f yielded a crash in mcv_get_match_bitmap.  I can reproduce it
>> with the following query on the regression database:
>>     select filler1 from mcv_lists where a is not null and (select 42) <= c;
>
>Seems to be the same problem I just complained of in the other
>thread: mcv_get_match_bitmap has an untenable assumption that
>"is a pseudoconstant" means "is a Const".
>

Yeah, that's a bug. Will fix (not sure how yet).

BTW which other thread? I don't see any other threads mentioning this
function.

>I notice it's also assuming that the Const must be non-null.
>It's not really easy to crash it that way, because if you just
>write "null <= c" that'll get reduced to constant-NULL earlier.
>But I bet there's a way.
>

Hmmm, I did not think of that.


regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: [sqlsmith] Crash in mcv_get_match_bitmap

From

Tom Lane

Date:

10 July 2019, 21:31:05

Tomas Vondra <tomas.vondra@2ndquadrant.com> writes:
> BTW which other thread? I don't see any other threads mentioning this
> function.


https://www.postgresql.org/message-id/flat/CA%2Bu7OA65%2BjEFb_TyV5g%2BKq%2BonyJ2skMOPzgTgFH%2BqgLwszRqvw%40mail.gmail.com

            regards, tom lane

Re: [sqlsmith] Crash in mcv_get_match_bitmap

From

Tom Lane

Date:

10 July 2019, 21:45:24

Tomas Vondra <tomas.vondra@2ndquadrant.com> writes:
> Yeah, that's a bug. Will fix (not sure how yet).

You could do worse than replace this:

        ok = (NumRelids(clause) == 1) &&
            (is_pseudo_constant_clause(lsecond(expr->args)) ||
             (varonleft = false,
              is_pseudo_constant_clause(linitial(expr->args))));

with something like

    if (IsA(linitial(expr->args), Var) &&
        IsA(lsecond(expr->args), Const))
       ok = true, varonleft = true;
    else if (IsA(linitial(expr->args), Const) &&
             IsA(lsecond(expr->args), Var))
       ok = true, varonleft = false;

Or possibly get rid of varonleft as such, and merge extraction of the
"var" and "cst" variables into this test.

BTW, I bet passing a unary-argument OpExpr also makes this code
unhappy.

            regards, tom lane

Re: [sqlsmith] Crash in mcv_get_match_bitmap

From

Tomas Vondra

Date:

10 July 2019, 21:59:04

On Wed, Jul 10, 2019 at 05:45:24PM -0400, Tom Lane wrote:
>Tomas Vondra <tomas.vondra@2ndquadrant.com> writes:
>> Yeah, that's a bug. Will fix (not sure how yet).
>
>You could do worse than replace this:
>
>        ok = (NumRelids(clause) == 1) &&
>            (is_pseudo_constant_clause(lsecond(expr->args)) ||
>             (varonleft = false,
>              is_pseudo_constant_clause(linitial(expr->args))));
>
>with something like
>
>    if (IsA(linitial(expr->args), Var) &&
>        IsA(lsecond(expr->args), Const))
>       ok = true, varonleft = true;
>    else if (IsA(linitial(expr->args), Const) &&
>             IsA(lsecond(expr->args), Var))
>       ok = true, varonleft = false;
>
>Or possibly get rid of varonleft as such, and merge extraction of the
>"var" and "cst" variables into this test.
>

OK, thanks for the suggestion.

I probably also need to look at the "is compatible" test in
extended_stats.c which also looks at the clauses. It may not crash as it
does not attempt to extract the const values etc. but it likely needs to
be in sync with this part.

>BTW, I bet passing a unary-argument OpExpr also makes this code
>unhappy.

Whooops :-(


regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: [sqlsmith] Crash in mcv_get_match_bitmap

From

Tom Lane

Date:

10 July 2019, 22:48:16

Oh ... while we're piling on here, it just sunk into me that
mcv_get_match_bitmap is deciding what the semantics of an operator
are by seeing what it's using for a selectivity estimator.
That is just absolutely, completely wrong.  For starters, it
means that the whole mechanism fails for any operator that wants
to use a specialized estimator --- hardly an unreasonable thing
to do.  For another, it's going to be pretty unreliable for
extensions, because I do not think they're all careful about using
the right estimator --- a lot of 'em probably still haven't adapted
to the introduction of separate <= / >= estimators, for instance.

The right way to determine operator semantics is to look to see
whether they are in a btree opclass.  That's what the rest of the
planner does, and there is no good reason for the mcv code to
do it some other way.

            regards, tom lane

Re: [sqlsmith] Crash in mcv_get_match_bitmap

From

Michael Paquier

Date:

11 July 2019, 00:55:01

On Wed, Jul 10, 2019 at 11:26:09PM +0200, Tomas Vondra wrote:
> Yeah, that's a bug. Will fix (not sure how yet).

Please note that I have added an open item for it.
--
Michael

Attachment

signature.asc

Re: [sqlsmith] Crash in mcv_get_match_bitmap

From

Tomas Vondra

Date:

11 July 2019, 14:59:32

On Thu, Jul 11, 2019 at 09:55:01AM +0900, Michael Paquier wrote:
>On Wed, Jul 10, 2019 at 11:26:09PM +0200, Tomas Vondra wrote:
>> Yeah, that's a bug. Will fix (not sure how yet).
>
>Please note that I have added an open item for it.

Thanks.

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: [sqlsmith] Crash in mcv_get_match_bitmap

From

Tomas Vondra

Date:

11 July 2019, 15:08:22

On Wed, Jul 10, 2019 at 06:48:16PM -0400, Tom Lane wrote:
>Oh ... while we're piling on here, it just sunk into me that
>mcv_get_match_bitmap is deciding what the semantics of an operator
>are by seeing what it's using for a selectivity estimator.
>That is just absolutely, completely wrong.  For starters, it
>means that the whole mechanism fails for any operator that wants
>to use a specialized estimator --- hardly an unreasonable thing
>to do.  For another, it's going to be pretty unreliable for
>extensions, because I do not think they're all careful about using
>the right estimator --- a lot of 'em probably still haven't adapted
>to the introduction of separate <= / >= estimators, for instance.
>
>The right way to determine operator semantics is to look to see
>whether they are in a btree opclass.  That's what the rest of the
>planner does, and there is no good reason for the mcv code to
>do it some other way.
>

Hmmm, but that will mean we're unable to estimate operators that are not
part of a btree opclass. Which is a bit annoying, because that would also
affect equalities (and thus functional dependencies), in which case the
type may easily have just hash opclass or something.

But maybe I just don't understand how the btree opclass detection works
and it'd be fine for sensibly defined operators. Can you point me to code
doing this elsewhere in the planner?

We may need to do something about code in pg10, because functional
dependencies this too (although just with F_EQSEL).  But maybe we should
leave pg10 alone, we got exactly 0 reports about it until now anyway.

regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: [sqlsmith] Crash in mcv_get_match_bitmap

From

Tomas Vondra

Date:

13 July 2019, 00:11:37

On Thu, Jul 11, 2019 at 05:08:22PM +0200, Tomas Vondra wrote:
>On Wed, Jul 10, 2019 at 06:48:16PM -0400, Tom Lane wrote:
>>Oh ... while we're piling on here, it just sunk into me that
>>mcv_get_match_bitmap is deciding what the semantics of an operator
>>are by seeing what it's using for a selectivity estimator.
>>That is just absolutely, completely wrong.  For starters, it
>>means that the whole mechanism fails for any operator that wants
>>to use a specialized estimator --- hardly an unreasonable thing
>>to do.  For another, it's going to be pretty unreliable for
>>extensions, because I do not think they're all careful about using
>>the right estimator --- a lot of 'em probably still haven't adapted
>>to the introduction of separate <= / >= estimators, for instance.
>>
>>The right way to determine operator semantics is to look to see
>>whether they are in a btree opclass.  That's what the rest of the
>>planner does, and there is no good reason for the mcv code to
>>do it some other way.
>>
>
>Hmmm, but that will mean we're unable to estimate operators that are not
>part of a btree opclass. Which is a bit annoying, because that would also
>affect equalities (and thus functional dependencies), in which case the
>type may easily have just hash opclass or something.
>
>But maybe I just don't understand how the btree opclass detection works
>and it'd be fine for sensibly defined operators. Can you point me to code
>doing this elsewhere in the planner?
>
>We may need to do something about code in pg10, because functional
>dependencies this too (although just with F_EQSEL).  But maybe we should
>leave pg10 alone, we got exactly 0 reports about it until now anyway.
>

Here are WIP patches addressing two of the issues:

1) determining operator semantics by matching them to btree opclasses

2) deconstructing OpExpr into Var/Const nodes

I'd appreciate some feedback particularly on (1).

For the two other issues mentioned in this thread:

a) I don't think unary-argument OpExpr are an issue, because this is
checked when determining which clauses are compatible (and the code only
allows the case with 2 arguments).

b) Const with constisnull=true - I'm not yet sure what to do about this.
The easiest fix would be to deem those clauses incompatible, but that
seems a bit too harsh. The right thing is probably passing the NULL to
the operator proc (but that means we can't use FunctionCall).

Now, looking at this code, I wonder if using DEFAULT_COLLATION_OID when
calling the operator is the right thing. We're using type->typcollation
when building the stats, so maybe we should do the same thing here.

regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachment

Re: [sqlsmith] Crash in mcv_get_match_bitmap

From

Tom Lane

Date:

13 July 2019, 15:39:55

Tomas Vondra <tomas.vondra@2ndquadrant.com> writes:
> On Thu, Jul 11, 2019 at 05:08:22PM +0200, Tomas Vondra wrote:
>> On Wed, Jul 10, 2019 at 06:48:16PM -0400, Tom Lane wrote:
>>> The right way to determine operator semantics is to look to see
>>> whether they are in a btree opclass.  That's what the rest of the
>>> planner does, and there is no good reason for the mcv code to
>>> do it some other way.

>> Hmmm, but that will mean we're unable to estimate operators that are not
>> part of a btree opclass. Which is a bit annoying, because that would also
>> affect equalities (and thus functional dependencies), in which case the
>> type may easily have just hash opclass or something.

After thinking about this more, I may have been analogizing to the wrong
code.  It's necessary to use opclass properties when we're reasoning about
operators in a way that *must* be correct, for instance to conclude that
a partition can be pruned from a query.  But this code is just doing
selectivity estimation, so the correctness standards are a lot lower.
In particular I see that the long-established range-query-detection
code in clauselist_selectivity is looking for operators that have
F_SCALARLTSEL etc. as restriction estimators (in fact I'm guessing you
lifted parts of the mcv code from that, cause it looks pretty similar).
So if we've gotten away with that so far over there, there's probably
no reason not to do likewise here.

I am a little troubled by the point I made about operators possibly
wanting to have a more-specific estimator than scalarltsel, but that
seems like an issue to solve some other time; and if we do change that
logic then clauselist_selectivity needs to change too.

> Here are WIP patches addressing two of the issues:

> 1) determining operator semantics by matching them to btree opclasses

Per above, I'm sort of inclined to drop this, unless you feel better
about doing it like this than the existing way.

> 2) deconstructing OpExpr into Var/Const nodes

deconstruct_opexpr is still failing to verify that the Var is a Var.
I'd try something like

    leftop = linitial(expr->args);
    while (IsA(leftop, RelabelType))
        leftop = ((RelabelType *) leftop)->arg;
    // and similarly for rightop
    if (IsA(leftop, Var) && IsA(rightop, Const))
        // return appropriate results
    else if (IsA(leftop, Const) && IsA(rightop, Var))
        // return appropriate results
    else
        // fail

Also, I think deconstruct_opexpr is far too generic a name for what
this is actually doing.  It'd be okay as a static function name
perhaps, but not if you're going to expose it globally.

> a) I don't think unary-argument OpExpr are an issue, because this is
> checked when determining which clauses are compatible (and the code only
> allows the case with 2 arguments).

OK.

> b) Const with constisnull=true - I'm not yet sure what to do about this.
> The easiest fix would be to deem those clauses incompatible, but that
> seems a bit too harsh. The right thing is probably passing the NULL to
> the operator proc (but that means we can't use FunctionCall).

No, because most of the functions in question are strict and will just
crash on null inputs.  Perhaps you could just deem that cases involving
a null Const don't match what you're looking for.

> Now, looking at this code, I wonder if using DEFAULT_COLLATION_OID when
> calling the operator is the right thing. We're using type->typcollation
> when building the stats, so maybe we should do the same thing here.

Yeah, I was wondering that too.  But really you should be using the
column's collation not the type's default collation.  See commit
5e0928005.

            regards, tom lane

Re: [sqlsmith] Crash in mcv_get_match_bitmap

From

Tomas Vondra

Date:

13 July 2019, 20:43:42

On Sat, Jul 13, 2019 at 11:39:55AM -0400, Tom Lane wrote:
>Tomas Vondra <tomas.vondra@2ndquadrant.com> writes:
>> On Thu, Jul 11, 2019 at 05:08:22PM +0200, Tomas Vondra wrote:
>>> On Wed, Jul 10, 2019 at 06:48:16PM -0400, Tom Lane wrote:
>>>> The right way to determine operator semantics is to look to see
>>>> whether they are in a btree opclass.  That's what the rest of the
>>>> planner does, and there is no good reason for the mcv code to
>>>> do it some other way.
>
>>> Hmmm, but that will mean we're unable to estimate operators that are not
>>> part of a btree opclass. Which is a bit annoying, because that would also
>>> affect equalities (and thus functional dependencies), in which case the
>>> type may easily have just hash opclass or something.
>
>After thinking about this more, I may have been analogizing to the wrong
>code.  It's necessary to use opclass properties when we're reasoning about
>operators in a way that *must* be correct, for instance to conclude that
>a partition can be pruned from a query.  But this code is just doing
>selectivity estimation, so the correctness standards are a lot lower.
>In particular I see that the long-established range-query-detection
>code in clauselist_selectivity is looking for operators that have
>F_SCALARLTSEL etc. as restriction estimators (in fact I'm guessing you
>lifted parts of the mcv code from that, cause it looks pretty similar).
>So if we've gotten away with that so far over there, there's probably
>no reason not to do likewise here.
>
>I am a little troubled by the point I made about operators possibly
>wanting to have a more-specific estimator than scalarltsel, but that
>seems like an issue to solve some other time; and if we do change that
>logic then clauselist_selectivity needs to change too.
>
>> Here are WIP patches addressing two of the issues:
>
>> 1) determining operator semantics by matching them to btree opclasses
>
>Per above, I'm sort of inclined to drop this, unless you feel better
>about doing it like this than the existing way.
>

OK. TBH I don't have a very strong opinion on this - I always disliked
how we rely on the estimator OIDs in this code, and relying on btree
opclasses seems somewhat more principled. But I'm not sure I understand
all the implications of such change (and I have some concerns about it
too, per my last message), so I'd revisit that in PG13.

>> 2) deconstructing OpExpr into Var/Const nodes
>
>deconstruct_opexpr is still failing to verify that the Var is a Var.
>I'd try something like
>
>    leftop = linitial(expr->args);
>    while (IsA(leftop, RelabelType))
>        leftop = ((RelabelType *) leftop)->arg;
>    // and similarly for rightop
>    if (IsA(leftop, Var) && IsA(rightop, Const))
>        // return appropriate results
>    else if (IsA(leftop, Const) && IsA(rightop, Var))
>        // return appropriate results
>    else
>        // fail
>

Ah, right. The RelabelType might be on top of something that's not Var.

>Also, I think deconstruct_opexpr is far too generic a name for what
>this is actually doing.  It'd be okay as a static function name
>perhaps, but not if you're going to expose it globally.
>

I agree. I can't quite make it static, because it's used from multiple
places, but I'll move it to extended_stats_internal.h (and I'll see if I
can think of a better name too).

>> a) I don't think unary-argument OpExpr are an issue, because this is
>> checked when determining which clauses are compatible (and the code only
>> allows the case with 2 arguments).
>
>OK.
>
>> b) Const with constisnull=true - I'm not yet sure what to do about this.
>> The easiest fix would be to deem those clauses incompatible, but that
>> seems a bit too harsh. The right thing is probably passing the NULL to
>> the operator proc (but that means we can't use FunctionCall).
>
>No, because most of the functions in question are strict and will just
>crash on null inputs.  Perhaps you could just deem that cases involving
>a null Const don't match what you're looking for.
>

Makes sense, I'll do that.

>> Now, looking at this code, I wonder if using DEFAULT_COLLATION_OID when
>> calling the operator is the right thing. We're using type->typcollation
>> when building the stats, so maybe we should do the same thing here.
>
>Yeah, I was wondering that too.  But really you should be using the
>column's collation not the type's default collation.  See commit
>5e0928005.
>

OK, thanks.


regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: [sqlsmith] Crash in mcv_get_match_bitmap

From

Tomas Vondra

Date:

15 July 2019, 01:34:25

OK, attached is a sequence of WIP fixes for the issues discussed here.


1) using column-specific collations (instead of type/default ones)

The collations patch is pretty simple, but I'm not sure it actually does
the right thing, particularly during estimation where it uses collation
from the Var node (varcollid). But looking at 5e0928005, this should use
the same collation as when building the extended statistics (which we
get from the per-column stats, as stored in pg_statistic.stacoll#).

But we don't actually store collations for extended statistics, so we
can either modify pg_statistic_ext_data and store it there, or lookup
the per-column statistic info during estimation, and use that. I kinda
think the first option is the right one, but that'd mean yet another
catversion bump.

OTOH 5e0928005 actually did modify the extended statistics (mvdistinct
and dependencies) to use type->typcollation during building, so maybe we
want to use the default type collation for some reason?


2) proper extraction of Var/Const from opclauses

This is the primary issue discussed in this thread - I've renamed the
function to examine_opclause_expression() so that it kinda resembles
examine_variable() and I've moved it to the "internal" header file. We
still need it from two places so it can't be static, but hopefully this
naming is acceptable.


3) handling of NULL values (Const and MCV items)

Aside from the issue that Const may represent NULL, I've realized the
code might do the wrong thing for NULL in the MCV item itself. It did
treat it as mismatch and update the bitmap, but it might have invoke the
operator procedure anyway (depending on whether it's AND/OR clause,
what's the current value in the bitmap, etc.). This patch should fix
both issues by treating them as mismatch, and skipping the proc call.


4) refactoring of the bitmap updates

This is not a bug per se, but while working on (3) I've realized the
code updating the bitmap is quite repetitive and it does stuff like

   if (is_or)
      bitmap[i] = Max(bitmap[i], match)
   else
      bitmap[i] = Min(bitmap[i], match)

over and over on various places. This moves this into a separate static
function, which I think makes it way more readable. Also, it replaces
the Min/Max with a plain boolean operators (the patch originally used
three states, not true/false, hence the Min/Max).


regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachment

Re: [sqlsmith] Crash in mcv_get_match_bitmap

From

Tomas Vondra

Date:

18 July 2019, 11:20:01

Hi,

I've pushed the fixes listed in the previous message, with the exception
of the collation part, because I had some doubts about that.

1) handling of NULL values in Cons / MCV items

The handling of NULL elements was actually a bit more broken, because it
also was not quite correct for NULL values in the MCV items. The code
treated this as a mismatch, but then skipped the rest of the evaluation
only for AND-clauses (because then 'false' is final). But for OR-clauses
it happily proceeded to call the proc, etc. It was not hard to cause a
crash with statistics on varlena columns.

I've fixed this and added a simple regression test to check this. It
however shows the stats_ext suite needs some improvements - until now it
only had AND-clauses. Now it has one simple OR-clause test, but it needs
more of that - and perhaps some combinations mixing AND/OR. I've tried
adding an copy of each existing query, with AND replaced by OR, and that
works fine (no crashes, estimates seem OK). But it's quite heavy-handed
way to create regression tests, so I'll look into this in PG13 cycle.


2) collations

Now, for the collation part - after some more thought and looking at code
I think the fix I shared before is OK. It uses per-column collations
consistently both when building the stats and estimating clauses, which
makes it consistent. I was not quite sure if using Var->varcollid is the
same thing as pg_attribute.attcollation, but it seems it is - at least for
Vars pointing to regular columns (which for extended stats should always
be the case).

And we reset stats whenever the attribute type changes (which includes
change of collation for the column), so I think it's OK. To be precise, we
only reset MCV list in that case - we keep mvdistinct and dependencies,
but that's probably OK because those don't store values and we won't
run any functions on them.

So I think the attached patch is OK, but I'd welcome some feedback.


regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachment

0001-Use-column-collation-for-extended-statistics.patch

Re: [sqlsmith] Crash in mcv_get_match_bitmap

From

Tom Lane

Date:

18 July 2019, 15:16:08

Tomas Vondra <tomas.vondra@2ndquadrant.com> writes:
> I've pushed the fixes listed in the previous message, with the exception
> of the collation part, because I had some doubts about that.

Sorry for being slow on this.

> Now, for the collation part - after some more thought and looking at code
> I think the fix I shared before is OK. It uses per-column collations
> consistently both when building the stats and estimating clauses, which
> makes it consistent. I was not quite sure if using Var->varcollid is the
> same thing as pg_attribute.attcollation, but it seems it is - at least for
> Vars pointing to regular columns (which for extended stats should always
> be the case).

I think you are right, but it could use some comments in the code.
The important point here is that if we coerce a Var's collation to
something else, that will be represented as a separate CollateExpr
(which will be a RelabelType by the time it gets here, I believe).
We don't just replace varcollid, the way eval_const_expressions will
do to a Const.

While I'm looking at the code --- I don't find this at all convincing:

                            /*
                             * We don't care about isgt in equality, because
                             * it does not matter whether it's (var op const)
                             * or (const op var).
                             */
                            match = DatumGetBool(FunctionCall2Coll(&opproc,
                                                                   DEFAULT_COLLATION_OID,
                                                                   cst->constvalue,
                                                                   item->values[idx]));

It *will* matter if the operator is cross-type.  I think there is no
good reason to have different code paths for the equality and inequality
cases --- just look at isgt and swap or don't swap the arguments.

BTW, "isgt" seems like a completely misleading name for that variable.
AFAICS, what that is actually telling is whether the var is on the left
or right side of the OpExpr.  Everywhere else in the planner with a
need for that uses "bool varonleft", and I think you should do likewise
here (though note that that isgt, as coded, is more like "varonright").

            regards, tom lane

Re: [sqlsmith] Crash in mcv_get_match_bitmap

From

Tomas Vondra

Date:

19 July 2019, 16:46:35

On Thu, Jul 18, 2019 at 11:16:08AM -0400, Tom Lane wrote:
>Tomas Vondra <tomas.vondra@2ndquadrant.com> writes:
>> I've pushed the fixes listed in the previous message, with the exception
>> of the collation part, because I had some doubts about that.
>
>Sorry for being slow on this.
>
>> Now, for the collation part - after some more thought and looking at code
>> I think the fix I shared before is OK. It uses per-column collations
>> consistently both when building the stats and estimating clauses, which
>> makes it consistent. I was not quite sure if using Var->varcollid is the
>> same thing as pg_attribute.attcollation, but it seems it is - at least for
>> Vars pointing to regular columns (which for extended stats should always
>> be the case).
>
>I think you are right, but it could use some comments in the code.
>The important point here is that if we coerce a Var's collation to
>something else, that will be represented as a separate CollateExpr
>(which will be a RelabelType by the time it gets here, I believe).
>We don't just replace varcollid, the way eval_const_expressions will
>do to a Const.
>

OK, thanks. I've added a comment about that into mcv_get_match_bitmap (not
all the details about RelabelType, because that gets stripped while
examining the opexpr, but generally about the collations).

>
>While I'm looking at the code --- I don't find this at all convincing:
>
>                            /*
>                             * We don't care about isgt in equality, because
>                             * it does not matter whether it's (var op const)
>                             * or (const op var).
>                             */
>                            match = DatumGetBool(FunctionCall2Coll(&opproc,
>                                                                   DEFAULT_COLLATION_OID,
>                                                                   cst->constvalue,
>                                                                   item->values[idx]));
>
>It *will* matter if the operator is cross-type.  I think there is no
>good reason to have different code paths for the equality and inequality
>cases --- just look at isgt and swap or don't swap the arguments.
>
>BTW, "isgt" seems like a completely misleading name for that variable.
>AFAICS, what that is actually telling is whether the var is on the left
>or right side of the OpExpr.  Everywhere else in the planner with a
>need for that uses "bool varonleft", and I think you should do likewise
>here (though note that that isgt, as coded, is more like "varonright").
>

Yes, you're right in both cases. I've fixed the first issue with equality
by simply using the same code as for all operators (which means we don't
need to check the oprrest at all in this code, making it simpler).

And I've reworked the examine_opclause_expression() to use varonleft
naming - I agree it's a much better name. This was one of the remnants of
the code I initially copied from somewhere in selfuncs.c and massaged
it until it did what I needed.


regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachment

Re: [sqlsmith] Crash in mcv_get_match_bitmap

From

Tom Lane

Date:

19 July 2019, 18:37:19

Tomas Vondra <tomas.vondra@2ndquadrant.com> writes:
> [ mcv fixes ]

These patches look OK to me.

            regards, tom lane

Re: [sqlsmith] Crash in mcv_get_match_bitmap

From

Tomas Vondra

Date:

20 July 2019, 14:38:16

On Fri, Jul 19, 2019 at 02:37:19PM -0400, Tom Lane wrote:
>Tomas Vondra <tomas.vondra@2ndquadrant.com> writes:
>> [ mcv fixes ]
>
>These patches look OK to me.
>

OK, thanks. Pushed.


-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services