Re: [INTERFACES] Re: [HACKERS] changes in 6.4 - Mailing list pgsql-hackers

From Bruce Momjian
Subject Re: [INTERFACES] Re: [HACKERS] changes in 6.4
Date
Msg-id 199808220353.XAA04528@candle.pha.pa.us
Whole thread Raw
In response to Re: [INTERFACES] Re: [HACKERS] changes in 6.4  (David Hartwig <daveh@insightdist.com>)
List pgsql-hackers
>
> Hannu Krosing wrote:
>
> > > The days where every release fixed server crashes, or added a feature
> > > that users were 'screaming for' may be a thing of the past.
> >
> > Is anyone working on fixing the exploding optimisations for many OR-s,
> > at least the canonic case used by access?
> >
> > My impression is that this has fallen somewhere between
> > insightdist and Vadim.
>
> This is really big for the ODBCers. (And I suspect for JDBCers too.)  Many
> desktop libraries and end-user tools depend on this "record set" strategy to
> operate effectively.
>
> I have put together a workable hack that runs just before cnfify().  The
> option is activated through the SET command.  Once activated, it identifies
> queries with this particular multi-OR pattern generated by these RECORD SET
> strategies.  Qualified query trees are rewritten as multiple UNIONs.   (One
> for each OR grouping).
>
> The results are profound.    Queries that used to scan tables because of the
> ORs, now make use of any indexes.   Thus, the size of the table has virtually
> no effect on performance.  Furthermore, queries that used to crash the
> backend, now run in under a second.
>
> Currently the down sides are:
>     1. If there is no usable index, performance is significantly worse.  The
> patch does not check to make sure that there is a usable index.  I could use
> some pointers on this.
>
>     2. Small tables are actually a bit slower than without the patch.
>
>     3.  Not very elegant.    I am looking for a more generalized solution.
> I have lots of ideas, but I would need to know the backend much better before
> attempting any of them.   My favorite idea is before cnfify(), to factor the
> OR terms and pull out the constants into a virtual (temporary) table spaces.
> Then rewrite the query as a join.   The optimizer will (should) treat the new
> query accordingly.  This assumes that an efficient factoring algorithm exists
> and that temporary tables can exist in the heap.
>
> Illustration:
> SELECT ... FROM tab WHERE
> (var1 = const1 AND var2 = const2) OR
> (var1 = const3 AND var2 = const4) OR
> (var1 = const5 AND var2 = const6)
>
> SELECT ... FROM tab, tmp WHERE
> (var1 = var_x AND var2 = var_y)
>
> tmp
> var_x  | var_y
> --------------
> const1|const2
> const3|const4
> const5|const6

David, where are we on this?  I know we have OR's using indexes.  Do we
still need to look this as a fix, or are we OK.   I have not gotten far
enough in the optimizer to know how to fix the cnf'ify problem.


--
Bruce Momjian                          |  830 Blythe Avenue
maillist@candle.pha.pa.us              |  Drexel Hill, Pennsylvania 19026
  +  If your life is a hard drive,     |  (610) 353-9879(w)
  +  Christ can be your backup.        |  (610) 853-3000(h)

pgsql-hackers by date:

Previous
From: Bruce Momjian
Date:
Subject: Re: [HACKERS] Some items for the TODO list
Next
From: Bruce Momjian
Date:
Subject: Re: [HACKERS] subselects & views