Re: [sqlsmith] Crash in mcv_get_match_bitmap - Mailing list pgsql-hackers
From | Tomas Vondra |
---|---|
Subject | Re: [sqlsmith] Crash in mcv_get_match_bitmap |
Date | |
Msg-id | 20190713204342.pysg4odqj2j6gn3o@development Whole thread Raw |
In response to | Re: [sqlsmith] Crash in mcv_get_match_bitmap (Tom Lane <tgl@sss.pgh.pa.us>) |
Responses |
Re: [sqlsmith] Crash in mcv_get_match_bitmap
|
List | pgsql-hackers |
On Sat, Jul 13, 2019 at 11:39:55AM -0400, Tom Lane wrote: >Tomas Vondra <tomas.vondra@2ndquadrant.com> writes: >> On Thu, Jul 11, 2019 at 05:08:22PM +0200, Tomas Vondra wrote: >>> On Wed, Jul 10, 2019 at 06:48:16PM -0400, Tom Lane wrote: >>>> The right way to determine operator semantics is to look to see >>>> whether they are in a btree opclass. That's what the rest of the >>>> planner does, and there is no good reason for the mcv code to >>>> do it some other way. > >>> Hmmm, but that will mean we're unable to estimate operators that are not >>> part of a btree opclass. Which is a bit annoying, because that would also >>> affect equalities (and thus functional dependencies), in which case the >>> type may easily have just hash opclass or something. > >After thinking about this more, I may have been analogizing to the wrong >code. It's necessary to use opclass properties when we're reasoning about >operators in a way that *must* be correct, for instance to conclude that >a partition can be pruned from a query. But this code is just doing >selectivity estimation, so the correctness standards are a lot lower. >In particular I see that the long-established range-query-detection >code in clauselist_selectivity is looking for operators that have >F_SCALARLTSEL etc. as restriction estimators (in fact I'm guessing you >lifted parts of the mcv code from that, cause it looks pretty similar). >So if we've gotten away with that so far over there, there's probably >no reason not to do likewise here. > >I am a little troubled by the point I made about operators possibly >wanting to have a more-specific estimator than scalarltsel, but that >seems like an issue to solve some other time; and if we do change that >logic then clauselist_selectivity needs to change too. > >> Here are WIP patches addressing two of the issues: > >> 1) determining operator semantics by matching them to btree opclasses > >Per above, I'm sort of inclined to drop this, unless you feel better >about doing it like this than the existing way. > OK. TBH I don't have a very strong opinion on this - I always disliked how we rely on the estimator OIDs in this code, and relying on btree opclasses seems somewhat more principled. But I'm not sure I understand all the implications of such change (and I have some concerns about it too, per my last message), so I'd revisit that in PG13. >> 2) deconstructing OpExpr into Var/Const nodes > >deconstruct_opexpr is still failing to verify that the Var is a Var. >I'd try something like > > leftop = linitial(expr->args); > while (IsA(leftop, RelabelType)) > leftop = ((RelabelType *) leftop)->arg; > // and similarly for rightop > if (IsA(leftop, Var) && IsA(rightop, Const)) > // return appropriate results > else if (IsA(leftop, Const) && IsA(rightop, Var)) > // return appropriate results > else > // fail > Ah, right. The RelabelType might be on top of something that's not Var. >Also, I think deconstruct_opexpr is far too generic a name for what >this is actually doing. It'd be okay as a static function name >perhaps, but not if you're going to expose it globally. > I agree. I can't quite make it static, because it's used from multiple places, but I'll move it to extended_stats_internal.h (and I'll see if I can think of a better name too). >> a) I don't think unary-argument OpExpr are an issue, because this is >> checked when determining which clauses are compatible (and the code only >> allows the case with 2 arguments). > >OK. > >> b) Const with constisnull=true - I'm not yet sure what to do about this. >> The easiest fix would be to deem those clauses incompatible, but that >> seems a bit too harsh. The right thing is probably passing the NULL to >> the operator proc (but that means we can't use FunctionCall). > >No, because most of the functions in question are strict and will just >crash on null inputs. Perhaps you could just deem that cases involving >a null Const don't match what you're looking for. > Makes sense, I'll do that. >> Now, looking at this code, I wonder if using DEFAULT_COLLATION_OID when >> calling the operator is the right thing. We're using type->typcollation >> when building the stats, so maybe we should do the same thing here. > >Yeah, I was wondering that too. But really you should be using the >column's collation not the type's default collation. See commit >5e0928005. > OK, thanks. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
pgsql-hackers by date:
Previous
From: Tom LaneDate:
Subject: Re: [PATCH] Improve performance of NOTIFY over many databases (issue blocking on AccessExclusiveLock on object 0 of class 1262 of database 0)
Next
From: Tomas VondraDate:
Subject: Re: [Proposal] Table-level Transparent Data Encryption (TDE) and KeyManagement Service (KMS)