Re: [INTERFACES] Re: [HACKERS] changes in 6.4 - Mailing list pgsql-hackers

From Bruce Momjian
Subject Re: [INTERFACES] Re: [HACKERS] changes in 6.4
Date
Msg-id 199909182010.QAA18416@candle.pha.pa.us
Whole thread Raw
In response to Re: [INTERFACES] Re: [HACKERS] changes in 6.4  (David Hartwig <daybee@bellatlantic.net>)
Responses PERL
List pgsql-hackers
This is an old message, but still relivant.  I belive 6.6 will have much
better OR memory usage.

> 
> 
> Bruce Momjian wrote:
> 
> > >
> > > Hannu Krosing wrote:
> > >
> > > > > The days where every release fixed server crashes, or added a feature
> > > > > that users were 'screaming for' may be a thing of the past.
> > > >
> > > > Is anyone working on fixing the exploding optimisations for many OR-s,
> > > > at least the canonic case used by access?
> > > >
> > > > My impression is that this has fallen somewhere between
> > > > insightdist and Vadim.
> > >
> > > This is really big for the ODBCers. (And I suspect for JDBCers too.)  Many
> > > desktop libraries and end-user tools depend on this "record set" strategy to
> > > operate effectively.
> > >
> > > I have put together a workable hack that runs just before cnfify().  The
> > > option is activated through the SET command.  Once activated, it identifies
> > > queries with this particular multi-OR pattern generated by these RECORD SET
> > > strategies.  Qualified query trees are rewritten as multiple UNIONs.   (One
> > > for each OR grouping).
> > >
> > > The results are profound.    Queries that used to scan tables because of the
> > > ORs, now make use of any indexes.   Thus, the size of the table has virtually
> > > no effect on performance.  Furthermore, queries that used to crash the
> > > backend, now run in under a second.
> > >
> > > Currently the down sides are:
> > >     1. If there is no usable index, performance is significantly worse.  The
> > > patch does not check to make sure that there is a usable index.  I could use
> > > some pointers on this.
> > >
> > >     2. Small tables are actually a bit slower than without the patch.
> > >
> > >     3.  Not very elegant.    I am looking for a more generalized solution.
> > > I have lots of ideas, but I would need to know the backend much better before
> > > attempting any of them.   My favorite idea is before cnfify(), to factor the
> > > OR terms and pull out the constants into a virtual (temporary) table spaces.
> > > Then rewrite the query as a join.   The optimizer will (should) treat the new
> > > query accordingly.  This assumes that an efficient factoring algorithm exists
> > > and that temporary tables can exist in the heap.
> > >
> > > Illustration:
> > > SELECT ... FROM tab WHERE
> > > (var1 = const1 AND var2 = const2) OR
> > > (var1 = const3 AND var2 = const4) OR
> > > (var1 = const5 AND var2 = const6)
> > >
> > > SELECT ... FROM tab, tmp WHERE
> > > (var1 = var_x AND var2 = var_y)
> > >
> > > tmp
> > > var_x  | var_y
> > > --------------
> > > const1|const2
> > > const3|const4
> > > const5|const6
> >
> > David, where are we on this?  I know we have OR's using indexes.  Do we
> > still need to look this as a fix, or are we OK.   I have not gotten far
> > enough in the optimizer to know how to fix the
> 
> Bruce,
> 
> If the question is, have I come up with a solution for the cnf'ify problem:  No
> 
> If the question is, is it still important:  Very much yes.
> 
> It is essential for many RAD tools using remote data objects which make use of key
> sets.  Your recent optimization of the OR list goes a long way, but inevitably
> users are confronted with multi-part keys.
> 
> When I look at the problem my head spins.   I do not have the experience (yet?)
> with the backend to be mucking around in the optimizer.  As I see it, cnf'ify is
> doing just what it is supposed to do.  Boundless boolean logic.
> 
> I think hope may lay though, in identifying each AND'ed group associated with a key
> and tagging it as a special sub-root node which cnf'ify does not penetrate.   This
> node would be allowed to pass to the later stages of the optimizer where it will be
> used to plan index scans.  Easy for me to say.
> 
> In the meantime, I still have the patch that I described in prior email.  It has
> worked well for us.  Let me restate that.   We could not survive without it!
> However, I do not feel that is a sufficiently functional approach that should be
> incorporated as a final solution.     I will submit the patch if you, (anyone) does
> not come up with a better solution.  It is coded to be activated by a SET KSQO to
> minimize its reach.
> 
> 


--  Bruce Momjian                        |  http://www.op.net/~candle maillist@candle.pha.pa.us            |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026
 


pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Why do we need pg_vlock?
Next
From: Bruce Momjian
Date:
Subject: Re: [HACKERS] Why do we need pg_vlock?