Re: POC, WIP: OR-clause support for indexes - Mailing list pgsql-hackers

From Teodor Sigaev
Subject Re: POC, WIP: OR-clause support for indexes
Date
Msg-id 56EAE73B.3010603@sigaev.ru
Whole thread Raw
In response to Re: POC, WIP: OR-clause support for indexes  (Tomas Vondra <tomas.vondra@2ndquadrant.com>)
Responses Re: POC, WIP: OR-clause support for indexes
List pgsql-hackers

> I also wonder whether the patch should add explanation of OR-clauses
> handling into the READMEs in src/backend/access/*

Oops, will add shortly.
>
> The patch would probably benefit from transforming it into a patch
> series - one patch for the infrastructure shared by all the indexes,
> then one patch per index type. That should make it easier to review, and
> I seriously doubt we'd want to commit this in one huge chunk anyway.
Ok, will do it.

> 1) fields in BrinOpaque are not following the naming convention (all the
> existing fields start with bo_)
fixed

>
> 2) there's plenty of places violating the usual code style (e.g. for
> single-command if branches) - not a big deal for WIP patch, but needs to
> get fixed eventually
hope, fixed

>
> 3) I wonder whether we really need both SK_OR and SK_AND, considering
> they are mutually exclusive. Why not to assume SK_AND by default, and
> only use SK_OR? If we really need them, perhaps an assert making sure
> they are not set at the same time would be appropriate.
In short: possible ambiguity and increasing stack machine complexity.
Let we have follow expression in reversed polish notation (letters represent a
condtion, | - OR, & - AND logical operation, ANDs are omitted):
a b c |

Is it ((a & b)| c) or (a & (b | c)) ?

Also, using both SK_ makes code more readable.


> 4) scanGetItem is a prime example of the "badly needs comments" issue,
> particularly because the previous version of the function actually had
> quite a lot of them while the new function has none.
Will add soon

>
> 5) scanGetItem() may end up using uninitialized 'cmp' - it only gets
> initialized when (!leftFinished && !rightFinished), but then gets used
> when either part of the condition evaluates to true. Probably should be
>
>      if (!leftFinished || !rightFinished)
>          cmp = ...
fixed

>
> 6) the code in nodeIndexscan.c should not include call to abort()
>
>      {
>          abort();
>          elog(ERROR, "unsupported indexqual type: %d",
>              (int) nodeTag(clause));
>      }
fixed, just forgot to remove

>
> 7) I find it rather ugly that the paths are built by converting BitmapOr
> paths. Firstly, it means indexes without amgetbitmap can't benefit from
> this change. Maybe that's reasonable limitation, though?
I based on following thoughts:
1 code which tries to find OR-index path will be very similar to existing
   generate_or_bitmap code. Obviously, it should not be duplicated.
2 all existsing indexes have amgetbitmap method, only a few don't. amgetbitmap
   interface is simpler. Anyway, I can add an option for generate_or_bitmap
   to use any index, but, in current state it will just repeat all work.

>
> But more importantly, this design already has a bunch of unintended
> consequences. For example, the current code completely ignores
> enable_indexscan setting, because it merely copies the costs from the
> bitmap path.
I'd like to add separate enable_indexorscan

> That's pretty dubious, I guess. So this code probably needs to be made
> aware of enable_indexscan - right now it entirely ignores startup_cost
> in convert_bitmap_path_to_index_clause(). But of course if there are
> multiple IndexPaths, the  enable_indexscan=off will be included multiple
> times.
>
> 9) This already breaks estimation for some reason. Consider this
...
> So the OR-clause is estimated to match 0 rows, less than each of the
> clauses independently. Needless to say that without the patch this works
> just fine.
fixed

>
> 10) Also, this already breaks some regression tests, apparently because
> it changes how 'width' is computed.
fixed too


> So I think this way of building the index path from a BitmapOr path is
> pretty much a dead-end.
I don't think so because separate code path to support OR-clause in index will
significanlty duplicate BitmapOr generator.

Will send next version as soon as possible. Thank you for your attention!

--
Teodor Sigaev                                   E-mail: teodor@sigaev.ru
                                                    WWW: http://www.sigaev.ru/

Attachment

pgsql-hackers by date:

Previous
From: Robert Haas
Date:
Subject: Re: Re: Add generate_series(date,date) and generate_series(date,date,integer)
Next
From: David Steele
Date:
Subject: Re: Re: Add generate_series(date,date) and generate_series(date,date,integer)