Re: [INTERFACES] Re: [HACKERS] changes in 6.4 - Mailing list pgsql-hackers

From David Hartwig
Subject Re: [INTERFACES] Re: [HACKERS] changes in 6.4
Date
Msg-id 35E0ABF0.578694C8@bellatlantic.net
Whole thread Raw
In response to Re: [INTERFACES] Re: [HACKERS] changes in 6.4  (Bruce Momjian <maillist@candle.pha.pa.us>)
Responses Re: [INTERFACES] Re: [HACKERS] changes in 6.4  (Bruce Momjian <maillist@candle.pha.pa.us>)
Re: [INTERFACES] Re: [HACKERS] changes in 6.4  (Bruce Momjian <maillist@candle.pha.pa.us>)
List pgsql-hackers

Bruce Momjian wrote:

> >
> > Hannu Krosing wrote:
> >
> > > > The days where every release fixed server crashes, or added a feature
> > > > that users were 'screaming for' may be a thing of the past.
> > >
> > > Is anyone working on fixing the exploding optimisations for many OR-s,
> > > at least the canonic case used by access?
> > >
> > > My impression is that this has fallen somewhere between
> > > insightdist and Vadim.
> >
> > This is really big for the ODBCers. (And I suspect for JDBCers too.)  Many
> > desktop libraries and end-user tools depend on this "record set" strategy to
> > operate effectively.
> >
> > I have put together a workable hack that runs just before cnfify().  The
> > option is activated through the SET command.  Once activated, it identifies
> > queries with this particular multi-OR pattern generated by these RECORD SET
> > strategies.  Qualified query trees are rewritten as multiple UNIONs.   (One
> > for each OR grouping).
> >
> > The results are profound.    Queries that used to scan tables because of the
> > ORs, now make use of any indexes.   Thus, the size of the table has virtually
> > no effect on performance.  Furthermore, queries that used to crash the
> > backend, now run in under a second.
> >
> > Currently the down sides are:
> >     1. If there is no usable index, performance is significantly worse.  The
> > patch does not check to make sure that there is a usable index.  I could use
> > some pointers on this.
> >
> >     2. Small tables are actually a bit slower than without the patch.
> >
> >     3.  Not very elegant.    I am looking for a more generalized solution.
> > I have lots of ideas, but I would need to know the backend much better before
> > attempting any of them.   My favorite idea is before cnfify(), to factor the
> > OR terms and pull out the constants into a virtual (temporary) table spaces.
> > Then rewrite the query as a join.   The optimizer will (should) treat the new
> > query accordingly.  This assumes that an efficient factoring algorithm exists
> > and that temporary tables can exist in the heap.
> >
> > Illustration:
> > SELECT ... FROM tab WHERE
> > (var1 = const1 AND var2 = const2) OR
> > (var1 = const3 AND var2 = const4) OR
> > (var1 = const5 AND var2 = const6)
> >
> > SELECT ... FROM tab, tmp WHERE
> > (var1 = var_x AND var2 = var_y)
> >
> > tmp
> > var_x  | var_y
> > --------------
> > const1|const2
> > const3|const4
> > const5|const6
>
> David, where are we on this?  I know we have OR's using indexes.  Do we
> still need to look this as a fix, or are we OK.   I have not gotten far
> enough in the optimizer to know how to fix the

Bruce,

If the question is, have I come up with a solution for the cnf'ify problem:  No

If the question is, is it still important:  Very much yes.

It is essential for many RAD tools using remote data objects which make use of key
sets.  Your recent optimization of the OR list goes a long way, but inevitably
users are confronted with multi-part keys.

When I look at the problem my head spins.   I do not have the experience (yet?)
with the backend to be mucking around in the optimizer.  As I see it, cnf'ify is
doing just what it is supposed to do.  Boundless boolean logic.

I think hope may lay though, in identifying each AND'ed group associated with a key
and tagging it as a special sub-root node which cnf'ify does not penetrate.   This
node would be allowed to pass to the later stages of the optimizer where it will be
used to plan index scans.  Easy for me to say.

In the meantime, I still have the patch that I described in prior email.  It has
worked well for us.  Let me restate that.   We could not survive without it!
However, I do not feel that is a sufficiently functional approach that should be
incorporated as a final solution.     I will submit the patch if you, (anyone) does
not come up with a better solution.  It is coded to be activated by a SET KSQO to
minimize its reach.


pgsql-hackers by date:

Previous
From: Bruce Momjian
Date:
Subject: Re: [HACKERS] Autoconf'd test for int64
Next
From: Bruce Momjian
Date:
Subject: Re: [HACKERS] initdb problem