Re: avoid bitmapOR-ing indexes with scan condition inconsistent with partition constraint - Mailing list pgsql-hackers
From | Masahiko Sawada |
---|---|
Subject | Re: avoid bitmapOR-ing indexes with scan condition inconsistent with partition constraint |
Date | |
Msg-id | CAD21AoC9e74EoznfxOTXQZYgXxtz5T6OFP06eDNqSoovBZiuSQ@mail.gmail.com Whole thread Raw |
In response to | Re: avoid bitmapOR-ing indexes with scan condition inconsistent with partition constraint (Masahiko Sawada <sawada.mshk@gmail.com>) |
List | pgsql-hackers |
On Mon, Feb 1, 2021 at 12:09 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > Hi, > > On Fri, Nov 13, 2020 at 5:14 AM Tom Lane <tgl@sss.pgh.pa.us> wrote: > > > > I started looking through this patch. I really quite dislike solving > > this via a kluge in indxpath.c. There are multiple disadvantages > > to that: > > > > * It only helps for the very specific problem of redundant bitmap > > index scans, whereas the problem of applying redundant qual checks > > in partition scans seems pretty general. > > > > * It's not unlikely that this will end up trying to make the same > > proof multiple times (and the lack of any way to turn that off, > > through constraint_exclusion or some other knob, isn't too cool). > > > > * It does nothing to fix rowcount estimates in the light of the > > knowledge that some of the restriction clauses are no-ops. Now, > > if we have up-to-date stats we'll probably manage to come out with > > an appropriate 0 or 1 selectivity anyway, but we might not have those. > > In any case, spending significant effort to estimate a selectivity > > when some other part of the code has taken the trouble to *prove* the > > clause true or false seems very undesirable. > > > > * I'm not even convinced that the logic is correct, specifically that > > it's okay to just "continue" if we refute the OR clause. That seems > > likely to break generate_bitmap_or_paths' surrounding loop logic about > > "We must be able to match at least one index to each of the arms of > > the OR". At least, if that still works it requires more than zero > > commentary about why. > > > > > > So I like much better the idea of Konstantin's old patch, that we modify > > the rel's baserestrictinfo list by removing quals that we can prove > > true. We could extend that to solve the bitmapscan problem by removing > > OR arms that we can prove false. So I started to review that patch more > > carefully, and after awhile realized that it has a really fundamental > > problem: it is trying to use CHECK predicates to prove WHERE clauses. > > But we don't know that CHECK predicates are true, only that they are > > not-false, and there is no proof mode in predtest.c that will allow > > proving some clauses true based only on other ones being not-false. > > > > We can salvage something by restricting the input quals to be only > > partition quals, since those are built to be guaranteed-true-or-false; > > we can assume they don't yield NULL. There's a hole in that for > > hashing, as I noted elsewhere, but we'll fail to prove anything anyway > > from a satisfies_hash_partition() qual. (In principle we could also use > > attnotnull quals, which also have that property. But I'm dubious that > > that will help often enough to be worth the extra cycles for predtest.c > > to process them.) > > > > So after a bit of coding I had the attached. This follows Konstantin's > > original patch in letting relation_excluded_by_constraints() change > > the baserestrictinfo list. I read the comments in the older thread > > about people not liking that, and I can see the point. But I'm not > > convinced that the later iterations of the patch were an improvement, > > because (a) the call locations for > > remove_restrictions_implied_by_constraints() seemed pretty random, and > > (b) it seems necessary to have relation_excluded_by_constraints() and > > remove_restrictions_implied_by_constraints() pretty much in bed with > > each other if we don't want to duplicate constraint-fetching work. > > Note the comment on get_relation_constraints() that it's called at > > most once per relation; that's not something I particularly desire > > to give up, because a relcache open isn't terribly cheap. Also > > (c) I think it's important that there be a way to suppress this > > overhead when it's not useful. In the patch as attached, turning off > > constraint_exclusion does that since relation_excluded_by_constraints() > > falls out before getting to the new code. If we make > > remove_restrictions_implied_by_constraints() independent then it > > will need some possibly-quite-duplicative logic to check > > constraint_exclusion. (Of course, if we'd rather condition this > > on some other GUC then that argument falls down. But I think we > > need something.) So, I'm not dead set on this code structure, but > > I haven't seen one I like better. > > > > Anyway, this seems to work, and if the regression test changes are > > any guide then it may fire often enough in the real world to be useful. > > Nonetheless, I'm concerned about performance, because predtest.c is a > > pretty expensive thing and there will be a lot of cases where the work > > is useless. I did a quick check using pgbench's option to partition > > the tables, and observed that the -S (select only) test case seemed to > > get about 2.5% slower with the patch than without. That's not far > > outside the noise floor, so maybe it's not real, but if it is real then > > it seems pretty disastrous. Perhaps we could avoid that problem by > > removing the "predicate_implied_by" cases and only trying the > > "predicate_refuted_by" case, so that no significant time is added > > unless you've got an OR restriction clause on a partitioned table. > > That seems like it'd lose a lot of the benefit though :-(. > > > > So I'm not sure where to go from here. Thoughts? Anyone else > > care to run some performance tests? > > Status update for a commitfest entry. > > Reading through the discussion, several patches have been proposed and > it has been inactive for almost 2 months. Does anyone listed as the > author plan to work on this patch? It looks like we're waiting for > some reviews on the patch including from the performance perspective > but this patch entry has been set to "Waiting on Author" since > 2021-01-12. If no one works on this and it's really waiting on the > author, I'm going to set it to "Returned with Feedback", barring > objections. I've moved this patch to "Returned with Feedback". Depending on timing, this may be reversable, so let us know if there are extenuating circumstances. In any case, you are welcome to address the feedback you have received, and resubmit the patch to the next CommitFest. Regards, -- Masahiko Sawada EDB: https://www.enterprisedb.com/
pgsql-hackers by date: